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Abstract Collagens are a still growing family of proteins with an extraordinary hetero- 
geneity: Today, 27 collagen types are known which are encoded by over 40 different genes. 
The diversity not only concerns the molecular assembly and the supramolecular structures 
but is also mirrored by tissue distribution, function and pathology of the collagen types. To 
become familiar with the collagen superfamily and to obtain a quick overview, we compiled 
Table 1 covering the "essentials" of the individual collagens. 
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Introduction 


Collagens are probably the most abundant proteins in the vertebrate body 
[1-3]. Their molecular hallmarks are the multiple repetitions of Gly-X-Y se- 
quences and the unique triple helical structure built by three polypeptide 
chains. Up to now, 42 different polypeptide chains have been identified, which 
are encoded by 41 specific genes and compose 27 unique collagen types. The 
collagen family can be classified into different subfamilies according to their 
supramolecular assembly. The tissue distribution of the different collagen types 
discloses a remarkable diversity and range, for instance from a more exclusive 
pattern for collagen X in hypertrophic cartilage or collagen II in cartilage and 
vitreous to a ubiquitous occurrence of other fibrillar collagens such as colla- 
gens I and V. Information on the function of various collagens in a given tissue 
has been retrieved from studies of the macromolecular organisation of mu- 
tated collagens in heritable collagen diseases. Table 1 provides backbone 
information on chain composition, molecular and supramolecular assembly, 
as well as tissue distribution and pathology of the currently known collagen 


types. 
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Abstract The collagen triple helix is a widespread structural element, which not only occurs 
in collagens but also in many other proteins. The triple helix consists of three identical or 
different polypeptide chains with the absolute requirement of a -Gly-Xaa-Yaa- repeat, in 
which the amino acid residues in X- and Y-position are frequently proline or hydroxypro- 
line. The freezing of q-angles in polypeptide backbone by proline rings and other steric 
restrictions are essential for stabilization. The OH-group of 4(R)-hydroxyproline, normally 
located in the Y-position, has an additional stabilizing effect. On the other hand, peptide 
bonds preceding proline and hydroxyproline are up to 20% in cis-configuration in unfolded 
chains and the need for a relatively slow cis-trans isomerization provides kinetic difficulties 
in triple helix formation. Because of their repeating structure, collagen chains easily misalign 
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during folding. Therefore, oligomerization domains flank triple helical domains in natural 
collagens. The mechanism by which these domains influence stability and kinetics was 
elucidated with model peptides using different types of trimerization domains. Finally, the 
review briefly describes mutations in collagen triple helices, which cause severe inherited 
diseases by disturbances of folding. 


Keywords Collagen - Folding - Thermodynamics - Kinetics - Nucleation - Alignment - 
Oligomerization 


List of Abbreviations 

Ac Acetyl 

Ala (A) L-Alanine 
Arg(R) L-Arginine 
Asn (N) L-Asparagine 
Asp (D) L-Aspartic acid 


CD Circular dichroism 
Cys (C) | r-Cysteine 
Flp 4(R)-Fluoroproline 


Gln (Q) L-Glutamine 
Glu (E) L-Glutamic acid 
Gly (G) ` Glycine 

His (H) L-Histidine 

Hyp (O)  4(R)-L-Hydroxyproline 
Tlu (1) L-Isoleucine 
Leu (L) L-Leucine 

Lys (K) L-Lysine 

Met (M)  r-Methionine 
OMe Methyl ester 
Phe (F) L-Phenylalanine 
Pro (P) L-Proline 


Ser (S) L-Serine 
Thr (T) L-Threonine 
Tm Midpoint transition temperature 


Trp(W)  r-Iryptophane 

Tyr (Y) L-Tyrosine 

Val (V) L-Valine 

Xaa,Yaa Amino acid residue in X-, Y-position 


AG? Standard free enthalpy (Gibbs free energy) 
AH), Standard calorimetric enthalpy 

AH, Standard van't Hoff enthalpy 

AS? Standard entropy 

1 


Structure of the Collagen Triple Helix 


The basic three-dimensional structure of the collagen triple-helix was first 
derived from fiber diffraction studies on collagen in tendon [1-3]. Each of the 
three chains in the molecule forms a left-handed polyproline-II-type helix. The 
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name of this helix is derived from poly-1-proline with all peptide bonds in trans 
conformation. It may however be formed also by peptides not containing 
proline or hydroxyproline. Its mode of stabilization is discussed later. The three 
helices, staggered by one residue relative to each other, are supercoiled along a 
common axis to a right-handed triple helix (Fig. 1). 

A requirement for the formation of the superhelix is the repeating sequence 
-Gly-Xaa-Yaa-. A straight not supercoiled polyproline-II-helix has exactly three 
residues per turn and glycine residues are facing each other in the interior of 
the triple helix (Fig. 1B). Larger side chains at the C, than H would prevent 


2.86 


-COOH 


Fig. 1A-C Model of the collagen triple helix. The structure is shown for (Gly-Pro-Pro), in 
which glycine is designated by 1, proline in X-position by 2 and proline in Y-position by 3: 
A, B side views. Three left-handed polyproline-II-type helices are arranged in parallel. For 
clarity the right-handed supercoil of the triple helix is not shown in A but indicated in B. 
Dashed lines indicate positions of C,-atoms (and not hydrogen bonds as in C). All indicated 
values for axial repeats correspond to the supercoiled situation; C top view in the direction 
of the helix axis. The three chains are connected by hydrogen bonds between the backbone 
NH of glycine and the backbone CO of proline in Y-position (dashed lines). Arrows indicate 
the directions in which other side chains than proline rings emerge from the helix. 
Approximate residue-to-residue distances, repeats of the polyproline-II- and triple helix and 
a scale bar are indicated in nm 
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formation of a hydrogen bond between the backbone NH-group of glycine and 
the backbone CO of a residue in X-position of a neighboring chain. These 
hydrogen bonds are a major source of stability (see below). Isolated polypro- 
line-II-helices are not stable if the polypeptide chains also contain other 
residues than proline and hydroxyproline. In contrast to the right-handed 
a-helices left-handed polyproline-II-helices are relatively rare structural ele- 
ments in proteins with the striking exception of collagens. 

The fiber diffraction, which were mainly performed on tendon fibers, 
showed a clear right-handed supercoiling of the three individual chains to a 
triple helix, leading to an increase to 3.33 residues per turn and a reduction of 
the axial repeat to 0.286 nm per residue, when viewed at the left handed indi- 
vidual helices (Fig. 1). 

In each of the left handed helices, ten tripeptide units form three turns, 
and these helices were therefore called 10/3 helices. In the right handed triple 
helix the axial repeat is 2.86 nm because an identical structural element reoc- 
curs after ten residues (Fig. 1B). Note that this element is located on a different 
chain. Lateral association of triple helices is very important in fibril formation 
and it should be noted already at this point that the interaction edges critically 
depend on the sequence of all three chains and on the symmetry of the helices. 
The collagen triple helix differs from other multi-stranded helices like coiled- 
coils structures or DNA double helices by a staggered arrangement of the three 
chains. As can be seen from Fig. 1A a glycine residue faces a residue in position 
X of a neighboring chain and this residue faces a residue in position Y of the 
third chain. Because of the stagger of the chains in collagen, for three different 
chains six alignments are possible. The topoisomerism of the triple helices is 
unknown for most natural collagens. It is however of importance for binding 
processes, in which an array of residues in different chains is recognized [4]. 
The staggered arrangement also leads to a bend between a collagen triple 
helix and a structure with a planar arrangement of chain ends. This bend was 
observed between (GİyProPro )ış and foldon [5] and is predicted for scavenger 
receptor and Marco (see below). 

Structural data were supplemented by crystallography of model peptides 
with Gly-Pro-Pro- and related repeats [6-9]. The earliest structure of the syn- 
thetic model peptide (Pro-Pro-Gly);; showed a 7/2 symmetry, giving rise to 
3.5 residues per turn and an axial repeat of 0.2 nm [6]. This is in contrast to the 
result of fiber diffraction studies, which as mentioned showed a 10/3 symme- 
try. In 1994, the crystal of a peptide that consists of ten repeats of Pro-Hyp-Gly 
with a single substitution of a glycine residue by an alanine in the middle was 
analyzed [7]. This peptide was a triple helical molecule with a length of 8.7 nm 
and a diameter of about 1 nm. The structure with a resolution of 0.19 nm was 
consistent with the general parameters derived form the fiber diffraction 
model, but again showed a 7/2 symmetry. It was hypothesized that imino acid 
rich peptides have a 7/2 symmetry, but imino acid poor regions adopt a struc- 
ture closer to a 10/3 helix [7]. This was later confirmed with a synthetic peptide 
that contained an imino acid poor region [9, 10]. The structure also confirmed 
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the expected hydrogen bond between the GlyNH and C=O of the proline in the 
Xaa position of a neighboring chain. In addition, an extensive water network 
was found, which formed hydrogen bonds with several carbonyl and hydroxyl 
groups of the peptide [11, 12]. This was interpreted as evidence for a stabiliz- 
ing role of water, an issue that is highly controversial (see later). 

Additional structural work was performed with peptides (Pro-Pro-Gly) io 
115, 14], (Pro-Hyp-Gly);, [8] and (Pro-Pro-Gly), [15]. In the later case the 
unusually high resolution of 0.1 nm was achieved. A structure of (Pro-Pro- 
Gly);, obtained from crystals grown in microgravity at the unusually high res- 
olution of 0.13 nm showed a preferential distribution of proline backbone and 
side chain conformations, depending on the position [15-17]. Proline residues 
in the Xaa position exhibit an average main chain torsion angle of ¢=-75° and 
a positive side-chain angle (down puckering of the proline ring) while proline 
residues in the Yaa position were characterized by a significantly smaller main 
chain torsion angle of -60? and a negative side chain angle (up puckering). 
These results were used to explain the stabilizing effect of hydroxyproline in the 
Yaa position, because hydroxyproline has a strong preference for up puckering. 
It also would explain the destabilizing effect of 4(R)-hydroxyproline in the Xaa 
position (see later). 

A functionally important structural feature of the collagen triple helix is the 
orientation of side chains. Side chains of residues in X- and Y-positions point 
out of the helix and are freely accessible for binding interactions. In fibril 
formation, intermolecular interactions between oppositely charged residues 
are important and hydrophobic interactions between residues of different mol- 
ecules, also play a role [19, 20]. The interactions between the collagen molecules 
in a fiber cause the characteristic quarter repeat of 67 nm and the complex 
cross-striation banding, both seen in electron micrographs [21,22]. For quan- 
tification, the accurate distribution of residues at the interaction boundaries 
must be known. As mentioned, they are strongly dependent on helix symme- 
try. For simplification constant values of the pitch were assumed but different 
axial repeats ranging from 0.2 to 0.286 nm were used by different authors 
in such calculations. Not surprisingly very different interaction edges were 
predicted. In reality, the axial repeat may be rather variable depending on the 
sequence in a given region. As already mentioned, sequence dependent struc- 
tural variations were indeed observed by crystallography [10]. In addition, the 
relatively strong intermolecular interactions may also alter the pitch and other 
structural parameters of the triple helix [23]. 

Several NMR-studies deal with the dynamics of the collagen triple [24-26] 
but many questions concerning the flexibility and fluctuations of the structure 
are still open. Several structures of synthetic model peptides were elucidated 
by NMR [25, 27-31]. Isotopic labeling was used to observe specific residues 
using heteronuclear NMR techniques. Hydrogen exchange studies were used to 
show that the NH groups of glycine exchanged faster in the imino acid poor 
region of a synthetic peptide compared to the Gly-Pro-Hyp region [25]. The 
hydration of the triple helix was also studied by NMR [32]. The hydration shell 
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was found to be kinetically labile with upper limits for water molecule resi- 
dence times in the nanosecond to sub-nanosecond range. 

The flexibility of the triple helix is apparent from electron micrographs 
of single molecules. Curved rods with a persistence length comparable to that 
of actin filaments were observed [33] and sites of increased bending were 
observable in wild-type [33] and mutated [34] molecules. Interestingly fiber 
forming collagen Lis a rather straight rod, suitable for parallel lateral alignment 
in fibers. In contrast, the triple helix of the basement membrane collagen IV 
contains many bends, which are caused by interruptions in the regular Gly- 
Xaa- Yaa-repeat [35]. The increased flexibility is of functional relevance for the 
assembly of collagen IV to sheath-like structures. 


2 
Collagens 


Collagens are very abundant proteins in the animal kingdom and are mainly 
located in the extracellular matrix. Typical locations are the mesogloea of 
jellyfish, the cuticle of worms, basement membranes of flies and the connective 
tissue of mammalians. Many of the collagens are well conserved from inverte- 
brates to vertebrates, an example is the basement membrane collagen IV. There 
are also species specific collagens and the cuticle collagens may serve as examples. 

The human collagen family of proteins now consists of 27 types. The indi- 
vidual members are numbered with roman numerals. The family is subdivided 
into different classes: the fibrillar collagens (types I’, II, III, V', XI", XXIV, and 
XXVII), basement membrane collagens (type IV’), fibril-associated collagens 
with interrupted triple helices (FACIT collagens, types IX”, XII, XIV, XVI, XIX, 
XX and XXI), short chain collagens (types VIII" and X), anchoring fibril colla- 
gen (type VII), multiplexins (types XV and XVIII), membrane associated col- 
lagens with interrupted triple helices (MACIT collagens, types XIII, XVII, 
XXIII, and XXV), and collagen type VT”. The types indicated by an asterisk are 
hetero trimers, consisting of two or three different polypeptide chains. Type IV 
collagens contain six different polypeptide chains that form at least three 
distinct molecules and type V collagens contain three polypeptide chains in 
probably three molecules. 

The common feature of all collagens is the triple helical domain with the 
repeated sequence -Gly-Xaa-Yaa- in the primary structure and the high content 
of proline and hydroxyproline residues, respectively. A large variation in the 
length of the triple helix can be found in nature [36], the shortest collagen 
described is 14 nm long (minicollagen of the nematocyst wall in hydra) and the 
longest ones 2400 nm (cuticle collagen of annelids). Collagens however also con- 
tain variable numbers of non-collagenous domains (frequently abbreviated 
NC-domains) including globular domains (examples: Van Willebrand type A 
domains, fibronectin type III domains) and three-stranded coiled-coil regions. 
In their multidomain nature collagens do not differ from other multifunctional 
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modular extracellular matrix proteins. However, the elongated shape of the triple 
helical domains and the linear arrangement of solvent exposed sidechains (see 
previous section) lend special properties to collagens. They are well suited to 
form long fibrils or assemble to complex supramolecular structures in skin, bone, 
tendon, cartilage, basement membranes and other connective tissues. The types 
of supramolecular structures differ considerably for different collagen types and 
reviews can be found in [20, 37]. Besides their structural roles, collagens interact 
with numerous other molecules and are crucial for development and home- 
ostasis of connective tissue. Much studied are the binding sites for numerous in- 
tegrins, which are ubiquitous cellular receptors for extracellular matrix proteins. 

The domain organization of a typical fibril forming collagen (collagen III) 
is shown in Fig. 2. This collagen consists of three identical chains. After removal 


non processed type III collagen (procollagen III) 


A 


C-propeptide 


—İ Col 1-3 triple helix (43 tripeptides) 


processed type III collagen 
(triple helix contains 1018 tripeptides) 


Fig.2 Domain organization of collagen III. At the top of the figure collagen III is shown be- 
fore processing by N- and C-proteinase, which cleave at sites marked by arrows after secre- 
tion. Triple helix forming regions with (GlyXaaYaa)-repeats are indicated by solid vertical 
lines and disulfide knots connecting the three identical chains by vertical bars. Dashed lines 
represent telopeptide regions with no (GlyXaaYaa)-repeats. Together with the N-propeptide 
the short triple helical region Col 1-3 is cleaved off, which served as model for triple helix 
folding. Many other folding studies were performed with mature processed collagen III 
(lower part of the figure), which contains 1018 tripeptide units in a triple helix. The number 
of tripeptide units in the triple helix is calculated from the number of tripeptides in each 
chain n by 3n-2, because 2 residue units are not incorporated. The chains are staggered by 
one residue per chain 
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of the signal peptide (not shown) it starts with an N-terminal non-collagenous 
domain also called N-propeptide, which is followed by a short triple helix with 
10 Gly-Xaa-Yaa-tripetide units per chain. A short sequence of residues without 
a Gly-Xaa-Yaa-repeat links it to the central triple helix of 343 tripeptide units 
per chain. Both triple helices are terminated by disulfide knots, which interlink 
all three chains. The sequences involved are GSOGPOGICESCPT in the short 
and GPOGAOGPCCGG in the central triple helix. The disulfide knot is followed 
by a large C-terminal domain (C-propeptide), which is essential for chain 
registration and triple helix nucleation (see later). 


3 
Other Proteins with Collagen Triple Helices 


Gly-Xaa-Yaa repeats that form triple helices are found in a number of proteins, 
which are not classified as collagens: complement protein Clq, lung surfactant 
proteins A and D, mannose binding protein, macrophage scavenger receptors A 
(types I, II, and III), MARCO, ectodysplasin-A, scavenger receptor with C-type 
lectin (SRCL), the ficolins (L, M and H), the asymmetric form of acetylcholin- 
esterase, adiponectin, and hibernation proteins HP-20, 25, and 27. For these 
proteins, triple helices are usually much shorter than for collagens and the po- 
tential for fibril formation is less prominent. Instead, the triple helical domains 
may serve as spacer elements between other domains or as oligomerization 
domains, which bundle three or more functional domains to multivalent com- 
plexes. The triple helices may also contain binding sites for interactions with 
other binding partners as it was demonstrated for macrophage scavenger 
receptors. 

Clq, a subunit of the first component of complement C1 [38] is a classical 
example for a functionally important oligomerization of binding domains by 
triple helices. The globular heads of C1q, which bind to immunoglobulins are 
composed of three domains connected by a collagen triple helix. Six trimers are 
linked by a collagen microfibril, leading to flower bouquet-like shape of the en- 
tire Clq molecule. The heads have a weak potential to bind to the Fc domains 
of IgG and IgM but binding of individual heads is too weak to mediate efficient 
interactions. Only when the heads are connected does Clq strongly interact 
with clusters of IgG. It is believed that the oligomeric structure of Clq is 
designed for efficient binding of clusters of IgG at an immunologically marked 
cell surface avoiding binding to isolated IgG, which would cause unwanted 
reactions with the complement system [39]. A similar effect is observed for 
other proteins in which oligomers are needed for high affinity [38]. It is well 
known that most lectins recognize monomeric sugars with only weak and poly- 
meric structures with high affinity. In many cases, this physiologically impor- 
tant feature is generated by oligomerization of lectin domains. In an important 
class of lectins, the collectins oligomerization is achieved by collagen triple 
helices, which trimerize the C-type lectin domains at their C-termini [40, 41]. 
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The distance between the carbohydrate binding sites is optimal for recognition 
of repeating units in a sugar chain or for cross-linking between chains [39, 42]. 
In macrophage scavenger receptor [43] the globular heads are connected to 
a collagen triple helix, which is followed by a three-stranded coiled coil. The two 
three-stranded structures probably stabilize each other in a mutual manner. 


4 
Domains Involved in Chain Alignment 


Before turning to triple helix folding it is important to familiarize with domains 
flanking the triple helical regions. These domains are frequently called N- and 
C-terminal propeptides or more globally non-collagenous (NC-) domains. As 
an example see the domain organization of procollagen III in Fig. 2. Up to about 
1980 it was believed that the few collagens known at this time (mainly colla- 
gen I) consist of a triple helix and short terminal so-called telopeptides only. To- 
day it is known that the N- and C-terminal propeptides are still present at the 
time at which the three polypeptide chains assemble and fold to native collagen 
molecules. It will be shown in the following sections that they play an important 
role in chain alignment, selection of proper chain composition and nucleation 
of triple helix formation. It is therefore not surprising that experiments prior to 
the discovery of the propeptides on folding of collagen triple helices yielded 
data, which do not reflect the physiological situation. Early experiments suffered 
from poor refolding yields, misalignments of chains and formation of wrong 
products. A word of warning should be placed because even to-day commer- 
cially produced collagens without their natural propeptides are used for phys- 
ical and chemical model studies without a clear recognition that important 
helper-domains are missing. 

In the biosynthesis of fibrillar collagens the propeptides are proteolytically 
removed only briefly before fibril formation and these events plays an impor- 
tant regulatory role in this process. The liberated propeptides are stable struc- 
tures and can be monitored for long periods of time in blood circulation and 
tissue distributions. Biological functions have been assigned to the circulating 
prodomains and deviations. In other collagens (collagens IV and VI) the 
NC-domains stay as part of the structure even after assembly. 

In collagen III the amino-terminal end of the carboxy-terminal propeptide 
contains short coiled-coil segments with three to four heptad repeats that might 
facilitate the trimerization potential of this domain and the nucleation of the 
triple helix [44]. Coiled-coil structures are very frequently occurring oligomer- 
ization domains [45, 46]. The telopeptides also contain a lysine residues for 
the formation of covalent crosslinks between molecules. The globular amino- 
terminal propeptide (Fig. 2) potentially interacts with growth factors and is 
probably not involved in triple helix formation. Coiled-coil domains are preva- 
lent in non-fibrillar collagens as well. The transmembrane domain containing 
collagens XIII, XVII, XXII and XXV show coiled-coil domains at the amino-ter- 
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minal end of the extracellular domain, close to the start of the triple helix and 
are apparently involved in alignment and nucleation [47]. 


5 
Model Peptides 


Synthetic peptides containing Gly-Xaa-Yaa repeats have been extensively used to 
study the thermal stability and folding of the collagen triple helix. These peptides 
can be synthesized as either single chains or crosslinked peptides. Solid-phase 
methods allowed the synthesis of peptides with a defined chainlength. (Pro-Pro- 
Gly), with n=5, 10, 15, and 20 [48], (Pro-4(R)Hyp-Gly), with n=5-10 [49], and 
(Pro-Pro-Gly), with n=10, 12, 14, and 15 [50] were synthesized. These peptides 
were widely distributed and studied in several research groups. Work was also 
performed with (Gly-Pro-Gly), with n=1-8 [51] and (Gly-Pro-Pro), with n=3-7 
[52]. Sutoh and Noda introduced the concept of block copolymers, where two 
blocks of (Pro-Pro-Gly), 1—5,6, or 7 were separated by a block of (Ala-Pro-Gly),, 
m=5, 3, and 1, respectively [50]. This concept was later extended to include all 
amino acids and called host-guest peptide system. The most studied host-guest 
system uses the sequence Ac-(Gly-Pro-4(R)Hyp)3-Gly-Xaa-Yaa-(Gly-Pro- 
4(R)Hyp),-Gly-Gly-NH, [53], but other variations were also studied 154, 55]. 

Because of the concentration dependence and the problem of formation of 
misaligned structures from single chain peptides, covalently linked homo- 
trimeric peptides were synthesized by various methods. Propane tricarboxylate 
tris(pentachlorophenyl) and N-tris(6-amino hexanoyl)-lysyl-lysine was used as 
covalent bridging molecules for the three chains [56]. Other covalent linkers 
include Kemp triacid [57], a modified di-lysine system [58] and tris(2-amino- 
ethyl)amine with succinic acid spacers [59]. 

Peptides with covalent crosslinks at both ends were also synthesized [60]. 
Peptide amphiphiles [61-63], and the iron(II) complex of bipyridine [64] have 
also been used to study trimeric peptides. 

The sequence found at the carboxy-terminal end of type III collagen Gly-Pro- 
Cys-Cys-Gly (see above) was introduced in synthetic peptides or recombinantly 
expressed model proteins and trimerization was achieved by reoxidation in the 
presence of mixtures of reduced and oxidized gluthatione [30, 31, 65-68]. The 
non-covalent obligatory trimer foldon [67,69] was fused to the C-terminal and 
N-terminus of model peptides of recombinant model proteins and in this way 
a very effective trimerization was achieved. Very recently a larger version of 
foldon (minifibritin) was combined with a collagen III-like disulfide knot to 
facilitate the oxidative disulfide coupling [70]. Initially, all peptides were stud- 
ied as homotrimers, but strategies were developed to synthesize heterotrimers. 
Fields proposed the use of the di-lysine system to synthesize heterotrimeric 
peptides of type IV collagen [58]. Moroder used a simplified cysteine bridge to 
synthesize the collagenase cleavage site of type I collagen [71] and the integrin 
binding site of type IV collagen [72]. 
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6 
The Coil to Triple Helix Transition 


Model peptides, which form triple helices, have been instrumental for the elu- 
cidation of the mechanism of the coil S triple helix transition and stabilizing 
interactions because of their uniform sequence and short length. We shall 
therefore describe their transition prior to those of long intact collagens. For 
short chains and a high cooperativity intermediates in a sequential folding 
process, can be neglected and the transition may be approximated by an “all- 
or-none" process from randomly coiled species to a fully helical species H with 
a single equilibrium constant [73]: 


[H] F 


K= = 
[C 3c0 (1 - F}? 


(1) 


Here cəz3 [H]+[C] is the total concentration of chains and F-3[H]/c, the 
degree of helicity. Each triple helix H contains 3n-2 tripeptide units in helical 
state. Subtraction of 2 originates from the chain stagger by which 2 tripeptide 
units cannot fully participate in the triple helix. Concentrations of H and C may 
also be expressed in concentrations of tripeptide units in either helical or coiled 
state. 

From 


AG? = -RT InK = AH? - TAS? (2) 


and from Eq. (1) it follows that at the midpoint of the transition (F=0.5 and 
T-T,), 


_ AH? 
~ AS? + RIn(0.75c2) 


(3) 


m 


where AG? is the standard Gibbs free energy, AH? is the standard enthalpy, and 
AS? is the standard entropy. It is important to note that the T» is concentration 
dependant and should only be compared at identical molar chain concentra- 
tions or concentrations which were corrected for concentration differences 
with Eq. (3). 

AH? can be obtained from the slope of the transition curves. The van't Hoff 
enthalpy can be approximated from the slope of the transition curve at its mid- 
point: 


dF 
AH? = 8 RT2, x (4) 
dT F=0.5 


or more accurately by fitting of the entire transition curve with Eq. (1) and 


AH?[( T a 
K = exp cc m” - 1n 0.75c6 (5) 
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This relation is obtained by solving Eq. (3) for AS? and substituting for AS? in 
Eq. (2). The temperature dependence of AH? is neglected, because the specific 
heat of the coil S triple helix transition was found to be close to 0 [74]. F is 
determined from the molar ellipticity [©] of CD or any other signal propor- 
tional to helicity. When measuring CD the molar ellipticity is given by 


[6] = F([6], - [6] + [6]. 


[O], and TƏT, are the ellipticities for the peptide in either the completely triple 
helical conformation or in the unfolded conformation. These values normally 
show linear temperature dependencies which can be described by 


[6], = [O], m + p(T - T4) 
[6]. m [@] m T q(T "s Ta) 


Here the ellipticities at the midpoint temperature Tu are arbitrarily used as 
reference points. For a best fit of the transition curves, initially the T,, is kept 
constant and AH?, p and q are varied. After an initial fit, the Tu, is then also 
varied. 

For peptides that contain a cross-link, the triple helix S coil transition 
becomes concentration independent and Eqs. (1) and (3) simplify to 


F 
Ce 
1-F 
and 
AH? 
m AS? 


The fitting procedure for the determination of the thermodynamic parameters 
is otherwise the same as for non-linked chains. 

AH? can also be determined calorimetrically. Most peptides, for which 
calorimetric data is available, show that the ratio of the van't Hoff enthalpy and 
the calorimetric enthalpy is close to 1, indicating that the all-or-none mecha- 
nism is a good approximation. However, exceptions have been reported, espe- 
cially for heterotrimeric peptides [72]. Great care has to be taken to evaluate 
true equilibrium transition curves by performing measurement at sufficiently 
slow speed. Because of the slow folding process in the transition region a 
hysteresis and slow establishment of equilibrium is observed, which is demon- 
strated for Ac-(Gly-Pro-4(R)Hyp);,- NH, in Fig. 3. The heating and cooling rate 
was as low as 10 °C/h but large deviations between transition curves recorded 
by heating and cooling were observed. Differences are dramatic at low con- 
centrations. 
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Fig.3 Triple helix coil transition of Ac-(Gly-Pro-4(R)Hyp),,-NH, in water. The transition 
was monitored by circular dichroism at 225 nm. Closed symbols are for heating and open 
symbols for cooling. The heating/cooling rate was 10 °C/h 


7 
Thermodynamic Values for Stability 


The temperature at the midpoint of the transition T,, is frequently used as a 
measure of stability in comparisons of different peptides or collagens. As 
shown above this value is dependent on the concentration of the peptide and 
the rate of heating, and only values either measured or corrected to the same 
concentration and heating rate should be compared. Unfortunately in most 
publications concentrations are not included. Published thermodynamic data 
for the triple helix coil transitions of identical synthetic peptides vary signifi- 
cantly (Tables 1 and 2). The commercially available peptides (Pro-Pro-Gly);; 
and (Pro-4( R)Hyp-Gly),, have been measured in several laboratories. The vant 
Hoff enthalpy change varies from -7.91 to -18.8 kJ/mol tripeptide units for 
(Pro-Pro-Gly);, and from -13.4 to -25.25 kJ/mol tripeptide units for (Pro- 
4(R)Hyp-Gly),,. These deviations point to the above mentioned problems 
of achieving equilibrium and to other sources of error including unknown 
concentration dependencies, misalignments of single chain peptides and miss- 
ing checks of the validity of the "all-or-none" approximation. It is recom- 
mended to consult the individual publications in order to gain information on 
reliability. 

Table 1 presents a selection of data on model peptides with repeating Gly- 
Xaa-Yaa units and Table 2 contains selected data on host-guest peptides. Note 
that the values of AH9,, AH?, and AS? of the homotripeptides in Table 1 are 
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Table 1 Thermodynamic data of the triple helix coil transition of model peptides in aque- 
ous solution determined from transition curves (AH? AS?) and differential scanning calori- 
metry (AH9,) 


Peptide Tin AH, AS? AH, Refer- 
(°C) (kJ/mol (J/mol tri- (kJ/mol ence 
tripeptide) peptide/K) tripeptide) 


(Pro-Pro-Gly);o 28 -10.6 -26.5 [50] 
(Pro-Pro-Gly),o 24.6 -7.91 -22.4 -7.68 [73] 
(Pro-Pro-Gly);, 25 -18.8 -58.6 [75] 
(Pro-Pro-Gly) 19 34 -18.6 -60.9 [76] 
(Pro-Pro-Gly), 1-10, 15, 20 -8.2 -19.2 77] 
(Pro-Pro-Gly), 1-12, 14, 15 -10.6 -26.5 50] 
(Pro-Pro-Gly);, 32.6 -6.43 78] 
(Pro-4(R)Hyp-Gly) 5 57.3 -134 -36.4 -13.4 [73] 
(Pro-4(R)Hyp-Gly)io pH 1 60.8 -23.6 -65.6 [79] 
(Pro-4(R)Hyp-Gly); pH 7 57.8 -23.3 -65.2 [79] 
(Pro-4(R)Hyp-Gly);s pH 13 60.8 -25.1 -70.7 [79] 
(Pro-4(R)Hyp-Gly);o 60.0 -13.9 [78] 
(Pro-4(R)Flp-Gly) 9 80 -17.1 76] 
Ac-(Gly-4(R)Hyp-Thr) ,,-NH; 18 -271 -87.1 -27.5 80] 
Ac-(Gly-Pro-Thr($Gal), ,-NH, 38.8 -14.0 -39.6 801 
Ac-(Gly-4(R)Hyp-Thr(BGal),,-NH; 50.0 -11.1 -29.0 80] 
((Gly-Pro-Thr);,-Gly-Pro-Cys-Cys); 13.8 -7.1 -41.5 -11.9 [81] 
((Gly-Pro-Pro);,-Gly-Pro-Cys-Cys); 82 [69] 
((Gly-Pro-Pro),,); foldon 66 -10.7 [67] 

70 [69] 


expressed per mol tripeptide units in order to compare data of peptides with 
different length. It has to be remembered, however, that contributions of tripep- 
tide units might not be completely additive because of energies, involved in 
nucleation of the triple helix [85]. For the host-guest peptides with different 
equences in the guest and host only total energies per mol triple helix are given 
in Table 2. Thermodynamic values are dominated by the large proline- and hy- 
droxyproline-rich host and the influence of the guest tripeptide is rather small. 
The effects of tripeptides of interest are only noticed by comparing Tm and 
other thermodynamic values with those of a reference guest peptide. T,, values 
are normally used as a measure of stability, although in a strict thermodynamic 
sense standard values of Gibbs free energy AG? at 273.2 K (25 ?C) should be 
used. Such values have not been evaluated for model peptides and only values 
at the midpoint temperature can be calculated by AG$,,-AH?- T, AS?. 

The data summarized in Tables 1 and 2 clearly confirm the crucial role of 
imino acids and in particularly of 4(R)Hyp in Y-position in the stabilization 
of the triple helix. The dominating effects of proline and hydroxyproline in 
the flanking Gly-Pro-4(R)Hyp sequences of the host/guest peptides facilitate 
triple helix formation even for highly destabilizing guests. Clearly tripeptide 
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Table 2 Thermodynamic data of the triple helix formation of host-guest peptides in PBS 
determined from transition curves (AH®,, AS?) and differential scanning calorimetry (AH) 


Peptide 


HG? Gly-Pro-4(R)Flp 
HG Gly-4(R)Hyp-Pro 


HG Gly-4(R)Hyp-4(R)Hyp 


HG Gly-Pro-Pro 

HG Gly-Ala-4(R)Hyp 
HG Gly-Ala-4(R)Hyp 
HG Gly-Leu-4(R)Hyp 
HG Gly-Phe-4(R)Hyp 
HG Gly-Pro-Ala 

HG Gly-Pro-Ala 

HG Gly-Pro-Leu 

HG Gly-Pro-Leu 

HG Gly-Pro-Phe 

HG Gly-Pro-Arg 

HG Gly-Pro-Arg pH 2.7 
HG Gly-Pro-Arg pH 12.2 
HG Gly-Asp-Lys pH 2.7 
HG Gly-Asp-Lys pH 12.2 
HG Gly-Asp-Arg pH 2.7 
HG Gly-Asp-Arg pH 12.2 
HG Gly-Glu-Lys pH 2.7 
HG Gly-Glu-Lys pH 12.2 
HG Gly-Glu-Arg pH 2.7 
HG Gly-Glu-Arg pH 12.2 
HG Gly-Lys-Asp pH 2.7 
HG Gly-Lys-Asp pH 12.2 
HG Gly-Arg-Asp pH 2.7 
HG Gly-Arg-Asp pH 12.2 
HG Gly-Lys-Glu pH 2.7 
HG Gly-Lys-Glu pH 12.2 
HG Gly-Arg-Glu pH 2.7 
HG Gly-Arg-Glu pH 12.2 
HG Gly-Gly-Phe 

HG Gly-Gly-Leu 

HG Gly-Gly-Ala 

HG Gly-Ala-Ala 

HG Gly-Ala-Leu 

HG Gly-Phe-Ala 

HG Gly-Ala-Phe 


Th 
(°C) 


43.7 
43.0 
47.3 
45.5 
39.9 
41.7 
39.0 
33.5 
38.3 
40.9 
32.7 
31.7 
28.3 
47.2 
45.5 
43.1 
26.5 
29.9 
33.4 
34.4 
29.5 
33.1 
37.3 
39.1 
30.5 
30.2 
28.8 
31.9 
36.5 
31.6 
35.0 
322 
19.7 
23.9 
25.0 
29.3 
27.8 
23.4 
20.7 


AH, 
(kJ/mol 
triple helix) 


-423.7 
-480 
-437.5 
-514.1 
-358.0 
-502 
-514.6 
-514 
-557.3 
-610 
-560 
-390 
-600 
-490 
-550 
-460 
-490 
-530 
-530 
-510 
-770 
-720 
-720 
-630 
-620 
-680 
-630 
-710 
-647 
-578 
-559 
-450 
-574 
-593 
-637 


AS? 
(J/mol triple 
helix/K) 


-1214 


-1256 
-1549 
-1005 


-1549 


-1717 
-1100 
-1300 
-1100 
-1000 
-1500 
-1700 
-1300 
-1500 
-1600 
-1600 
-1500 
-2400 
-2300 
-2300 
-1900 
-1900 
-2100 
-1900 
-2200 
-2093 
-1800 
-1758 
-1400 
-1758 
-1884 
-2051 


AH? Refer- 


(kJ/mol triple ence 
helix) 


-204 [78] 
-204 [78] 
2217 [78] 
-213 78] 
53] 
82] 
53] 
53] 
53] 
82] 
53] 
82] 
[53] 
[83] 
[83] 
[83] 
[83] 
[83] 
[83] 
[83] 
[83] 
[83] 
[83] 
[83] 
83] 
83] 
83] 
83] 
83] 
83] 
83] 
83] 
84] 
84] 
84] 
83] 
[53] 
[53] 
[53] 


3 HG refers to the host-guest peptides with the structure Ac(Pro-4(R)Hyp-Gly)3-Gly-Xaa- 
Yaa-(Pro-4(R)Hyp-Gly),-Gly-Gly-NH,. The tripeptide sequence given after HG corre- 
sponds to Gly-Xaa-Yaa. The peptides were measured in PBS, pH 7, unless indicated other- 


wise. 


22 J. Engel - H. P. Báchinger 


units which lack Pro and 4(R)Hyp contribute little energy and homopeptides 
containing these units are very instable or do not form triple helices at all. 
Replacement of 4(R)Hyp in Y-position by Pro leads to a decrease of Tu and to 
decreases of standard values of negative enthalpy. The mode of stabilization by 
proline and hydroxyproline will be discussed in the next paragraph. An inter- 
esting additional stabilization was discovered in deep-sea annelid collagens 
[65] and followed up with model peptides by Bann et al. [80] (Table 1). O-Gly- 
cosylation of threonine in position Y stabilizes the triple helix in a similar way 
as hydroxyproline. Glycosylation of the OH-groups in the annelid collagen was 
demonstrated by amino acid sequencing but the nature of the sugars remained 
unknown. In these peptides one galactose was attached to threonine with a 
p-linkage. Interestingly unglycosylated threonine has no stabilizing effect. 
Another residue with high stabilization potential is arginine in position Y [83]. 
There are also data, which suggest that glutamine in position X and arginine in 
Y form stabilizing pairs. 

More general attempts have been made to predict the stability contributions 
of different peptide units in collagens from thermodynamic data of peptides. 
The pioneering work of Dólz and Heidemann [86] and two recent papers may 
serve as examples [78, 87]. 

When comparing the values in Tables 1 and 2 in detail many conflicts can be 
discovered and the limitations discussed in the beginning of this section should 
be kept in mind. Only few thermodynamic data have been collected for trimeric 
peptides generated by fusion with the disulfide knot of collagen III or a trimeric 
registration domain. The very stable trimeric foldon domain of T4-phage 
fibritin was successfully applied (Table 1) [67, 69]. For trimerized model pep- 
tides the Tm is much higher than for single chain peptides and is concentration 
independent. The oligomerization and registration of the chains leads to a 
stabilization by entropic effects [67, 88]. At the site of linkage the three chains 
are held in very close neighborhood. The local concentration at this site was es- 
timated to be close to 1 mol/l. The stabilization by foldon resembles the func- 
tion of the oligomerization domains in natural collagens. Recently foldon was 
linked to human type I and III collagen and efficient trimerization was ob- 
served [89]. Unexpectedly a large increase in yields of expression from a yeast 
system was also achieved. Trimerized peptides offer large advantages and are 
much closer to real collagens than single chain peptides. In future comparisons 
of stability the risk of errors from unknown concentrations, misalignments and 
hysteresis effects could be avoided with these models. 

Tables 1 and 2 only contain thermodynamic values of model peptides. Data 
also exist for collagens and fragments derived from them. These are averages 
of all contributions in a protein with a complicated sequence. It may be men- 
tioned however, that the enthalpy change of the coil to triple helix transition of 
collagen I was determined to be AH°=-(15-18) kJ/mol tripetide units [74]. It 
was also found that the specific heat of the reaction was near to zero. In con- 
trast to globular proteins AH? is therefore temperature independent, a feature 
also found for some of the model peptides. Another general feature was noticed 
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by Privalov [74]. On a per residue basis, the change of enthalpy for structure 
formation of the triple helix is significantly larger than that of globular pro- 
teins. 


8 
Mechanism of Stabilization by Proline and Hydroxyproline 


The two imino acids proline and hydroxyproline stabilize polyproline-II-type 
helices by specific steric restrictions imposed by the proline rings, which con- 
nect the Ca-atoms with the preceding nitrogen in the polypeptide backbone. 
The rotation around the Ca-N bond (®-angle) is thus frozen by the ring to 
about ®=-75+35°. In addition rotation around the Ca-CO bond (W-angle) 
is also heavily restricted by several clashes caused by unfavorable van der 
Waals interactions [90]. It was demonstrated that only Y=140° is accessible for 
poly-L-proline or poly-L-hydroxyproline with all peptide bonds in trans con- 
formation. The two angles $ and V define the conformation of a polypeptide 
backbone uniquely. In a Ramachandran plot (6 vs V) [91,92] the polyproline- 
II-helix exists only in a narrow area defined by these angles. Note that in the 
present review the newly defined zero origins for ® and V are used, whereas 
the excellent book by Dickerson and Geis still uses the old ones. The stability 
of the polyproline helix is largely defined by the intramolecular van der Waals 
interactions, which are responsible for the restrictions in conformational 
freedom. This fact is often overlooked in discussions about other modes of 
stabilization (hydrogen bonds, hydrophobic and electrostatic interactions). For 
iminoacids, however, the steric restrictions imposed by the proline ring play the 
dominant role. 

Very early it was noticed that hydroxyproline exhibits an additional stabi- 
lizing effect. This followed from a comparison of the thermal stability of 
collagens from different species with different contents of proline and hydroxy- 
proline [74, 93] and more conclusively from a comparison of collagen-like 
model peptides (see previous section). Only 4(R)-hydroxyproline [94], but not 
4(S)-hydroxyproline [55, 95, 96] shows a stabilizing action indicating that the 
effect originates from the OH-group and its position at the proline ring. 

Two very different explanations were offered to explain the stabilizing effect 
of the OH-group of 4(R)-hydroxyproline. The first explanation is a stabilizing 
effect of water bridges between the OH-group and backbone groups. Water 
mediated bridges were seen in crystal structures of model peptides ([11] and 
citations therein) but it remained open, whether these indirect hydrogen bond 
networks provide stabilizing energy to the peptide. Clearly large unfavorable 
entropic contributions are expected [97]. The observation that the stability 
difference between (Gly-Pro-Pro), and (Gly-Pro-Hyp), was maintained in non- 
aqueous solvents even after careful removal of trace water [85] also argued 
against the water bridge hypothesis. The second and now more generally 
accepted explanation is stabilization by the inductive effect of the OH-group. 


24 J. Engel - H. P. Báchinger 


This notion is mainly based on the finding that 4(R)-fluoroproline in the Y-po- 
sition of the Gly-Pro-Yaa repeat, increases the midpoint transition temperature 
to even higher values than 4(R)-hydroxyproline [98]. The fluorine group does 
not form hydrogen bonds but has a higher electronegativity than the OH group. 

The replacement of the hydroxyl group by fluorine has several effects. The 
inductive effect of the fluorine group influences the pucker of the pyrrolidine 
ring. This is a stereoelectronic effect as it depends on the configuration of the 
substituent. 4(R) Substituents stabilize the Cy-exo pucker, while 4(S) sub- 
stituents stabilize the Cy-endo pucker. The X-ray structure of (Pro-Pro-Gly);, 
showed a preference of proline residues in the Xaa position in Cy-endo pucker, 
while the Yaa position prolines preferred Cy-exo puckering [17,18]. This puck- 
ering influences the range of the main chain dihedral angles « and wm of 
proline, which are required for optimal packing in the triple helix. In addition, 
the trans/cis ratio of the peptide bond is influenced by the substituent [99-101]. 
Because all peptide bonds in the triple helix have to be trans, the helix is sta- 
bilized, if the amount of trans peptide bonds in the unfolded state is increased. 


9 
Cis-trans Equilibria of Peptide Bonds 


Before dealing with the kinetics of collagen folding in the following sections it 
is necessary to discuss the role of cis-trans isomerization of peptide bonds. The 
peptide bond shows partial double bond character and is planar [91]. The 
flanking C, atoms can be either in trans (@=180°) or cis (0-0?) conformation. 
For peptide bonds preceding residues other than proline, only a small fraction 
(0.11 to 0.4896) was found to be in the cis conformation in the unfolded state 
[102]. For peptide bonds preceding proline this fraction is much higher 
(Table 3). The small energy difference between the two conformations is ex- 
plained by similar internal van der Waals interaction in cis and trans confor- 
mation because of the symmetry of the tertiary peptide nitrogen. In contrast, for 
residues with side chains other than proline a large asymmetry exist between the 
N-C, bond and the NH group. cis Contents from 10 to 3096 were found in short 
unstructured proline containing peptides [103, 104]. This relates to the equilib- 
rium constant between cis and trans states in the unfolded state Kane of 0.11 
to 0.43. Values of Kəyan: for a number of collagen models are listed in Table 3. 

The collagen triple helix can accommodate only trans peptide bonds, so the 
ratio of cis to trans peptide bonds in the unfolded state has a direct influence 
on the stability of the triple helix. 

In a transition from the unfolded state to the triple helix all cis peptide 
bonds have to be converted to trans because cis bonds can't be incorporated in 
this structure. Therefore, changes in K ue may influence stability. Changes of 
the cis-trans equilibrium by inductive effects in hydroxyproline or fluoropro- 
line (see above) are however relatively small. Much larger effects of cis-trans 
isomerization are expected for the kinetics of the transition, its high activation 
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Table3 Kons in model compounds and unfolded type I collagen measured by NMR 


Compound K isnansé Reference 
Ac-4(S) Hyp-OMe 0.37 Báchinger and Peyton 
Ac-Pro-OMe 0.16 Unpublished 
Ac-4(R)Hyp-OMe 0.12 

Ac-Pro-OMe 0.217 [100] 
Ac-4(R)Hyp-OMe 0.164 

Ac-4(R)Flp-OMe 0.149 

Ac-4(S)Flp-OMe 0.4 

Ac-4(S)Hyp-OMe 0.417 [99] 
Ac-Pro-OMe 0.21 [101] 
Ac-4(R)Flp-OMe 0.14 

Ac-4(S)Flp-OMe 0.39 

Ac-4,4 difluoroproline-OMe 0.29 

Unfolded type I collagen [105] 
Xaa-Pro 0.19 

Xaa-Hyp 0.087 


energy (see following section). Kinetic parameters are also more significantly in- 
fluenced by inductive effects of fluoroproline than equilibrium constants [101]. 


10 
Kinetics of Triple Helix Formation 


For a discussion of the kinetic mechanism of triple helix formation several 
steps have to be distinguished. Similar to the folding of other structures like 
a-helices, DNA-double helices or multi-stranded coiled-coil structures the 
nucleation of the helix is distinctly different from the following propagation 
steps. Nucleation includes the first encounter of chains, chain selection and the 
difficult and usually slow formation of a first nucleus of triple helical structure. 
Since the probability of meeting of three chains is very low in solution phase 
dimeric intermediates likely occur during nucleation. Nucleation may happen 
at many sites but may be limited to preferred sites of increased nucleation 
potential in other cases. Depending on whether nucleation or propagation is the 
rate limiting step, kinetics will be completely different. For the folding of 
the triple helix from single chains of unlinked model peptides or collagen frag- 
ments nucleation events are predominant at low peptide concentrations. For 
(ProProGly);, and (ProHypGly);, a reaction mechanism with a dimeric inter- 
mediate was derived [68] and the reaction order decreased from 3 (at very 
low concentrations) to 1 (extrapolated for high concentrations). The change of 
reaction order was explained by a switch of rate determining nucleation at low 
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Table 4 Rate constants and activation energies of triple helix folding from single and trimer- 
ized chains at 20 °C 


Protein k, (M? s!) k (s!) E, (kJ/mol) 
(ProProGly);, 900 - 7 
(ProHypGly) About 10° - 8 
(GlyProPro);, foldon 0.00197 54.5 
(GlyProPro) ,)Cys, 0.00033 52:5 


to rate determining propagation at high concentrations. Rate constants of nu- 
cleation k, and propagation k as well as activation energies E, derived for these 
model peptides are listed in Table 4. Another kinetic study with a different sin- 
gle chain model was interpreted by a different mechanism of nucleation [106]. 
Both studies demonstrate the complexity of the kinetics of folding from single 
chains. 

For this reason further kinetic work was performed with model peptides in 
which the three chains are linked, either by the disulfide knot of collagen III or 
by the obligatory trimer foldon [67-70]. For these models, which are much 
closer to true collagens, first order kinetics was observed and helix propagation 
was rate determining. Therefore, only propagation rate constants k could be de- 
termined (Table 4). 

Most, if not all, collagens and collagen-like proteins contain a oligomeriza- 
tion domain or disulfide knot (see above), which keep the chains in an aligned 
trimeric state. These linkers provide the correct chain alignment and provide 
a high local concentration near the connecting point, thus facilitating nucle- 
ation. Nucleation is therefore much faster than propagation and is not observed 
in the kinetic process. In most collagens the oligomerization domains are lo- 
cated at the C-terminal end of the triple helical domain and propagation pro- 
ceeds in a zipper-like fashion from the C- to N-terminus. This directionality of 
folding was very clearly demonstrated for collagen III by monitoring the 
growth of trypsin resistant helical segments [107]. Here folding starts at the 
disulfide knot (Fig. 2). Conclusive data were also obtained for collagen IV, in 
which propagation starts at the C-terminal propeptide [108]. In this case it was 
possible to visualize the growth of the triple helix directly by electron mi- 
croscopy. The zipper like growth was confirmed by monitoring the chain length 
dependence of the folding kinetics [107, 109, 110]. The short and the major 
triple helical domains of collagen III were employed and supplemented by the 
three-quarter fragment of this molecule (Fig. 4). 

Interestingly the kinetics proceeds in two phases. In the kinetic experiments 
of Fig. 4 the dead time was 25 s and therefore the fast phase remained unre- 
solved. Its amplitude was about 50% of the total for the small triple helix of pep- 
tide Col 1-3 but was hardly detectable for the larger proteins. Here almost the 
entire transition proceeded in a slow phase. As expected for a zipper-like fold- 
ing initial rates of the slow phase were inversely proportional to the number of 
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Fig.4 Comparison of the folding kinetics of type III pN-collagen, the quarter fragment 
of type III collagen, and peptide Col 1-3. All three proteins contained a disulfide knot at the 
C-terminus. Refolding of the collagen (c), the quarter fragment (b), and Col1-3 (a) were 
measured by circular dichroism. The straight lines represent the initial rates. Original data 
from [110] 


tripeptide units in the triple helices. Because of a long linear phase of the 
reaction [110] half times of folding increased proportionally with the number 
of tripeptide units. 

The rate determining steps in the slow phase of propagation are cis-trans 
isomerization steps of peptide bonds. As shown in the previous section and 
Table 3, a large fraction of the peptide bonds preceding proline and hydroxy- 
proline is in cis-conformation at equilibrium in the unfolded state. In colla- 
gen III such peptide bond alternate with those of much lower cis potential but 
at average the fraction of cis-bonds is very high. During folding cis-peptide 
bonds have to be converted to the trans-state required for the native triple 
helix. The rate constant of cis to trans conversion is approximately k,;,;,,,.— 
0.0003 s^! at 20°C (h in Table 4) and similar rate constants were derived from 
an analysis of the kinetics of collagen [110]. 

A second characteristic feature of a kinetics determined by cis-trans iso- 
merization is the high temperature dependence. For collagen III activation 
energies of E,—44-70 kJ/mol were derived [107,111]. The values match activa- 
tion energies, which were determined for small proline dependent peptide [103, 
104] and collagen model peptides [68]. In the transition from cis to trans the 
double bond related energy is removed at the transition state giving rise to a 
high activation energy. In a first approximation, the value of E, is therefore an 
intrinsic parameter of the peptide bond but dependencies of side chains were 
noticed in studies with short peptides. 

The fast phase of the folding process was correlated to the folding of the 
short region of the polypeptide chains before a peptide bond in cis conforma- 
tion is met during zipper-like folding [110]. The amplitude was found to 
increase, when refolding was started from a non-equilibrium state of the 
unfolded molecules in which less cis-peptide bond were present than in the 
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equilibrium state. The non-equilibrium state was achieved by refolding from 
chains, which were unfolded so quickly, that most of the trans configuration 
of the native state was maintained. Similar double jump experiments were 
performed in classical experiments with ribonuclease S [112]. The fast phase 
of collagen folding is biologically not very interesting, because natural collagens 
fold predominantly by the slow phase. The kinetics of folding of a collagen 
triple helix without restrictions by cis-trans isomerization is however an 
appealing biophysical questions. For the a-helix and the DNA double helix high 
folding rates were determined [113] but the maximum rate of collagen triple 
helix folding is still unknown. 

As mentioned, collagen model peptides with fused oligomerization domains 
were used in several studies. As in true collagens nucleation is no longer rate 
limiting and again a fast and slow propagation reaction was observed. The rate 
constants derived for the slow phase match values expected for cis-trans 
isomerization (Table 4). An interesting study with such engineered collagen 
models was performed in order to solve the question, whether the kinetics of 
triple helix folding may be different for nucleation at the C- or N-terminus. The 
problem was approached by a comparison of model peptides (GlyProPro) 9; 
which were either linked to trimers at the N- or C-terminus [69]. Linkage was 
either achieved by fusion with a short segment containing the disulfide knot 
of collagen III or by attachment of the foldon domains. In both cases, a stabi- 
lization of the triple helix after cross-linking was observed. The kinetics of 
triple-helix formation was identical for all four model systems (GlyProPro),- 
foldon, foldon-(GlyProPro),, (GlyProPro),-Cys; and Cys;-(GlyProPro),. Also 
the activation energies of folding were identical for all four peptides. Rate con- 
stants of folding were about 10? s^! at 20 °C and the activation energy was 
50 kJ/mol [69]. The data indicate that triple helix formation proceeds with 
closely similar rates in both directions. These findings are relevant in view of 
recent suggestions, that the folding may start at the N-terminus for a group of 
membrane bound collagens [47, 114]. 


11 
Hysteresis 


Unfolding and refolding of the triple helix is highly rate dependent in the tran- 
sition region. Consequently, transition curves recorded by heating and by 
cooling form a hysteresis loop [115-117]. Hysteresis is very prominent for long 
natural collagens like collagen III and is also observed under isothermal con- 
ditions when unfolding is induced by increasing the concentration of guanidine 
hydrochloride. In the case of collagen III with an intact disulfide knot at the 
C-terminus, correct and complete refolding of native molecules is achieved at 
temperatures 10 or more degrees below the transition temperature (or 0.5 mol/l 
below the guanidinium hydrochloride midpoint concentration) after short 
incubation times of a few hours. In these cases, true hysteresis, in which equi- 
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librium values are not reached even after very long waiting times, is limited to 
the range near the transition temperature. As mentioned, unlinked single 
chains refold to misaligned structures at low temperatures. In these cases, an 
apparent hysteresis is observed, when monitoring circular dichroism or other 
parameters (Fig. 3). The nature of this apparent hysteresis is very different from 
the true hysteresis because refolding products differ from the native parent 
molecules. 

It was noticed that hysteresis was less prominent for short collagen triple 
helices like the one-quarter fragment of collagen III [115]. More recently, it was 
found that hysteresis loops are clearly observable also for short model peptides 
as long as the heating and cooling rates are not too slow. The major difference 
between native long triple helices and short model peptides is apparently the 
time dependence. For long triple helices the loops persist even at very slow scan 
rates whereas for short triple helices equilibrium is achieved more quickly. 

For the model systems with linked chains, a simple kinetic hysteresis mech- 
anism is proposed (Boudko, Báchinger and Engel, in preparation). In the 
transition region the reciprocal apparent rate constant is 


kap = (6) 


in which k; and k, are the rate constants of folding and unfolding, respectively. 
Rate constant k;is dependent on temperature according to the Arrhenius rela- 
tion with the high activation energy E, of cis-trans isomerization (see above). 
The temperature dependence of k, is much lower and its activation energy can 
be calculated from and the enthalpy of the reaction. A satisfactory fit of the 
experimentally observed change of helicity F during heating or cooling is 
obtained when the rate law 


RP dg F)-k,F (7) 
dt |. f u 


is integrated with the starting conditions (1) F=1 at t=0 for heating and (2) 
F=0 at t=0 for cooling. Case 1 yields the time course 


K 1 
= + e Fappt (8) 
1+K 1+K 
and case 2 
F= — (1 - eher) (9) 
1+K 


with K=k,/k,. The hysteresis loops can be constructed by calculation of F 
after different times at different temperatures. Obviously for infinite times t the 
equilibrium curve F=K/(1+K) is obtained. 
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The mechanism of the hysteresis for long triple helices of natural collagens 
is clearly more complicated. They do not fold in an all-or-none reaction as the 
short proteins as demonstrated by a cooperative length, which is about 1/10 
of the length of the molecule [115]. A possible mechanism was proposed by 
Engel and Báchinger [81]. 


12 
Mutations in Collagen Triple Helices Effect Proper Folding 


Mutations in collagen genes cause a number of severe inherited diseases 
[118, 119]. The best-studied collagen related genetic disease is osteogenesis 
imperfecta (brittle bone disease). Point mutations at different positions of the 
triple helixlead to improperly folded and instable triple helices, disturbed fiber 
formation and severe pathological changes of the collagen matrix. Mutations 
near the C-terminus tend to be more severe than similar mutations near the 
N-terminus. In view of the steric need of small glycine residues in every third 
position (see above), mutations of glycines to residues with larger side chains 
are particularly disturbing and cause large decreases of transition tempera- 
tures. In some cases, even kinks were visualized in such mutated collagens 
[120]. A delayed triple helix formation of mutant collagen from patients with 
osteogenesis imperfecta was observed [121]. Hydroxylation of prolines only 
occurs in the unfolded state and consequently the extent of hydroxylation was 
much increased in the mutated collagens, apparently at the N-terminal side 
of the mutation. Recently model peptides were studied with sequence irregu- 
larities designed after important natural mutations [122]. With this approach, 
it is hoped to gain deeper explanations of how a single point mutation in the 
triple helix may cause a global pathological condition. 
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Abstract The collagen family is highly complex and shows a remarkable diversity in molec- 
ular and supramolecular organization, tissue distribution and function. Collagen types are 
classified in several sub-families according to sequence homologies and to similarities in their 
structural organization and supramolecular assembly. This overview covers advances made 
in this field during the last five years. Due to space limitation, the reader is referred to previ- 
ous reviews for the topics not discussed hereafter. We focus on recently described fibrillar 
collagens (collagens XXIV and XXVII) and FACITs (collagens XVI, XIX, XX, XXI and XXII), 
multiplexins (collagens XV and XVIII), membrane-collagens (collagens XIII, XVII, XXIII and 
XXV) and collagen XXVI, on other proteins containing triple-helical domains including the 
members of the new Emu family and on the structure and functions of several non collage- 
nous domains found in collagens. We also discuss data on collagen-related diseases with 
particular emphasis on gene therapy and on the involvement of collagens in neurodegener- 
ative diseases, which emerge as a major threat for public health in aging populations. 


Keywords Collagen - Collagen-related diseases - Extracellular domains 


List of Abbreviations 


BMP Bone Morphogenetic Protein 

COL Collagenous domain 

CQT5 Clq tumor necrosis factor-related protein 5 

CRR Cystein-Rich Repeat domain 

DDR Discoidin Domain Receptors 

EMI N-Terminal cysteine-rich domain found in the Emu family 
Emu Emilin and multimerin 

ERK Extracellular-signal-Regulated Kinase 

FACIT Fibril-Associated Collagen with Interrupted Triple Helix 
FN3 Fibronectin type III repeat 

FAK Focal Adhesion Kinase 

MARCO MAcrophage Receptor with COllagenous structure 

NC Non-Collagenous domain 

PKC Protein Kinase C 

RDEB Recessive Dystrophic Epidermolysis bullosa 

TSPN Thrombospondin N-terminal like domain 


SR-AI, SR-AII Scavenger receptors type A 
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TGF- Transforming Growth Factor f 

TNF Tumor Necrosis Factor 

vWA von Willlebrand factor A-like domain 
1 

Introduction 


Collagens are trimeric molecules made of three polypeptide o chains, which 
contain the sequence repeat (G-X-Y)n, X being frequently proline and Y 
hydroxyproline. These repeats allow the formation of a triple helix, which is the 
characteristic feature of the collagen superfamily. The side chains of each X and 
Y residue are at the surface of the triple helix, giving the collagen molecule 
a significant capacity for lateral interactions with other molecules of the ex- 
tracellular matrix and resulting in the formation of various supramolecular 
assemblies. Collagens are modular proteins containing not only triple-helical 
(COL) but also non triple helical (NC) domains found in a number of other ex- 
tracellular matrix proteins [1, 2]. Since many collagen chains were initially 
characterized by partial sequencing missing the sequences encoding the 
N-terminus of the protein, investigators in the field numbered these domains 
starting from the C-terminus of the protein. The reverse order has however 
been used by others. This is the case for collagens VII [3], XII [4], XXII [5], 
XXIII [6], XXV [7] and XXVI [8]. 

Collagen nomenclature is somewhat confusing. Although there are no offi- 
cial criteria, it is generally considered that, for a protein to be called a collagen, 
it should contain at least one triple helical domain, should be able to form 
supramolecular aggregates and should be deposited within the extracellular ma- 
trix. Most of the members of the collagen family fulfill these criteria. However, 
collagens XIII, XVII, XXIII and XXV contain a transmembrane domain and do 
not appear to form supramolecular structures of their own, but they contain 
collagenous domains located within the extracellular matrix. Multiplexins (mul- 
tiple triple helix domains and interruptions) including collagens XV and XVIII 
contain a triple helical domain with multiple interruptions and are found in 
basement membrane zones, but their supramolecular structures are still un- 
known. On the other hand, some molecules such as emilins, which contain a 
collagenous domain and are associated with supramolecular assemblies (elas- 
tic fibers) in the extracellular matrix are not classified as collagens at this time. 

With the completion of the sequencing of the human genome, it is likely that 
most human collagen genes are now discovered. The first collagens to be studied 
were extracted from tissues and characterized at the protein level by various bio- 
chemical methods (Fig. 1). Then, most of the newly discovered collagens were first 
identified as cDNA clones. One of the most recently described collagen, collagen 
XXVI, was identified by yeast two-hybrid screening of a whole mouse embryo 
cDNA library using the collagen-specific molecular chaperone HSP47 as a bait [8]. 
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Fig.1 Discovery of the collagen super family members: a 50-year story 


2 
The Collagen Family: an Update 


For a comprehensive coverage of the structure and functions of collagens 
I-XIV, which are not discussed in this issue, the reader is referred to detailed 
reviews on fibrillar collagens [9] and unconventional collagens including col- 
lagen VII, network-forming collagens (VI, VIII and X) and the FACITS (IX, XII, 
XIV, XVI and XIX) [10] and to recent general reviews [11, 12]. Due to space lim- 
itation, sequence data, splicing variants and domain organization of each col- 
lagen chain will not be detailed below. Swiss Prot or TTEMBL access numbers 
for all the members of the collagen superfamily (Tables 1 and 2) are provided 
for easy access of the reader to these data. Intermolecular interactions of col- 
lagens with other constituents of extracellular matrix and with cells fall outside 
the scope of this overview and the reader is referred to the Protein Profile Se- 
ries dedicated to fibrillar [9] and unconventional collagens [10] for full report 
on these topics. This section will focus on the most recently described collagens 
and on proteins containing triple-helical domains, the number of which has 
also increased in the past years. 
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2.1 
Fibrillar Collagens 


2.1.1 
Features of Fibrillar Collagens 


The fibril-forming or fibrillar collagens represent the most abundant product 
synthesized by connective tissue cells. As a consequence, they have also been 
the first members of the collagen superfamily to be discovered. Since 1979, the 
year of collagen XI discovery, the fibrillar collagens comprised five members: 
the quantitatively major fibrillar collagens types I, II and III, and the minor 
collagen types V and XI [13]. As no novel fibrillar collagens has been described 
until very recently, the term of “classical” collagens is also commonly used to clas- 
sify this subset of the collagen family. Fibrillar collagens all share a common 
chain structure composed of a large collagenous domain (COLI) bordered by 
the N- and C-terminal extensions called the N- and C-propeptides, respectively. 
The C-propeptide is referred to as the NC1 domain, the N-propeptide is divided 
into sub-domains: a short sequence (NC2) that links the major triple helix to the 
minor one (COL2), and a globular N-terminal end (NC3) that can show struc- 
tural variations and splicing alternatives (Fig. 2A). Notably, alternative splicing 
within the NC3 domain of proa1(II) mRNA generates a long form of collagen II, 
the collagen IIA and a short form, IIB with distinct tissue localization and func- 
tion as discussed lower than (see below). Fibrillar collagens are synthesized as 
precursors containing both propeptides, referred to as procollagens, that are 
secreted in the extracellular space. Fibrils represent the final product of classi- 
cal fibrillar collagen biosynthesis. Thus far, only fibrillar procollagens require 
proteolytic removal of the N and C-terminal extensions to achieve a mature and 
functional molecule. This step was considered as an absolute prerequisite for 
correct fibril formation (see [14, 15] for reviews). This dogma was revisited with 
the observation that the mature molecule of minor fibrillar collagens V and XI 
conserved a part of their N-terminal extension. The persistence of the N-propep- 
tide is in part responsible for the control of heterotypic growth by sterically lim- 
iting lateral molecule addition (see [13] for review). The N-propeptide of the 
classical fibrillar collagen represents the domain that shows the most important 
sequence and structure variations between the different members. Nevertheless, 
almost all fibrillar collagen chains can be divided into two separate groups 
according to the structure of their N-terminal globular domain referred to as the 
NC3 domain (Fig. 2A). The NC3 domains of a1(I), «1(IIA), a1(III) and a2(V) 
chains all harbor a cystein-rich repeat domain (CRR) and will be designed here- 
after as the CRR-containing chains, while a thrombospondin N-terminal-like 
(TSPN) domain is found in a1(V), a3/a4(V), &1(XI), a2(XI), «1(XXIV) and 
ov (XXVII). New interesting data have been reported on the structure and func- 
tions of these two domains that will be presented below. 


A. THE FIBRILLAR COLLAGENS 
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B. THE FACIT COLLAGENS 
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Fig.2A,B Fibrillar collagens and FACITS 
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2.1.2 
The Novel Fibrillar Collagens: Collagens XXIV and XXVII 


Unexpectedly, last year, two novel members of the fibrillar collagen group, 
numbered collagens XXIV [16] and XXVII [17, 18] have been identified. They 
both share the structural features of fibrillar collagens, namely a long collage- 
nous domain flanked at both extremities by globular N- and C-propeptides. 
Their large amino-terminal domains (up to 550 residues) are closely related to 
those of collagen V and XI that comprise two sub-domains, the thrombo- 
spondin N-terminal-like motif, referred to as TSPN, and a variable region 
(Fig. 2A). The C-propeptide is well conserved and presents 8 cysteine residues 
suggesting a possible homotrimeric association of these collagens. However, 
these two novel collagens present several features unusual for mammalian col- 
lagens, but shared with invertebrate fibrillar collagens [19]. The major triple he- 
lix differs substantially from classical fibrillar collagens. It is shorter than in the 
equivalent domain of the classical collagens and they contain imperfections in 
the (G-X-Y), repeats. The significance of triple helix imperfections is not clear 
and biochemical investigation is required to elucidate their physiological rel- 
evance. Minor interruptions of the G-X-Y repeats in the major triple helical do- 
main of the classical fibrillar collagens are known to cause disease [15]. How- 
ever, triple helix imperfections resulting in kinks along the rigid rod-like 
molecular structure might also have biological implications. These particular 
flexible regions might hold potential binding sites accessible for molecular and 
cellular interactions or for protease activity. The total length of the major triple 
helix of the novel collagens varies from 991 to 997 amino acids depending on 
the collagen chains and species, but is always shorter than the equivalent fib- 
rillar collagen domains (1014-1020 residues). However, the exact number of 
residues of the collagenous domain is debatable and may vary depending on 
the inclusion or not of a short triple helical sequence found in these collagens 
in close proximity to the main triple helical domain. Based on the structural 
features of invertebrate fibrillar collagens [20], the first N-terminal triple he- 
lix interruption might be considered as a break between the minor and the ma- 
jor triple helix. The resulting minor triple helix would then consist of 19 triplets 
and the major triple helix would then be 931 residues in length. Human colla- 
gen XXVII also contains unexpected residues such as tryptophan and cysteine 
within the triple helix domain. There significance is not clear since they are not 
all conserved among species and in the closely related collagen XXIV. 
Classical fibrillar collagens share the unique property to aggregate into 
highly-ordered fibrils. Most of the collagen fibrils are made up of different 
fibrillar collagen types and are referred to as heterotypic fibrils (Fig. 2A). The 
current view of their tissue distribution is that heterotypic fibrils composed of 
collagen I, III and/or V are widely distributed in connective tissues whereas 
cartilaginous tissues and vitreous contain collagen II/XI mixed fibrils. Whether 
the novel collagens can incorporate heterotypic banded fibrils is not known. 
Based on the known fibrillogenesis mechanisms, the unusual structural fea- 
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tures of these collagens such as the distortion and reduced length of the triple 
helical domain argue against this possibility. 

Unlike the collagen V heterotrimer, collagen V homotrimer that presents a 
kink within the triple helix generated by the presence of an imino-acid poor re- 
gion does not incorporate into collagen I fibrils [21, 22]. The two “imperfect” 
fibrillar collagens might interfere with fibril formation and thus act as a nega- 
tive regulating factor of heterotypic fibril growth. The distinct expression pat- 
terns described for col24a1 and col27a1 genes indicate that a close association 
of collagen XXVII with collagens II and XI, and alternatively of collagen XXIV 
with collagens I and V is likely to occur. Collagen XXIV expression was found 
strictly confined to bone and cornea both containing heterotypic fibrils formed 
with collagens I and V solely, and is thus not found in collagen III containing 
tissues such as skin, tendon and vessels. Collagen XXVII exhibited a strong 
expression in cartilage and, for this reason, represents a new member of the 
cartilaginous subset of the fibrillar collagen family. 


2.1.3 
New Insights on the Classical Collagen V Structure and Functions 


Although the biosynthesis of classical fibrillar collagen has been extensively 
studied since their early discovery, some questions regarding their structure, 
processing and fibril formation remain astonishingly to be answered. Some 
of them have been elucidated during the last five years and are reviewed by 
others in this issue, notably concerning enzymes ensuring the processing 
to mature molecule and the molecular mechanisms that drive self-assembly 
of monomers. Another unexpected recent result came from collagen V chain 
studies. Apart from the most abundant form found in tissues, the heterotrimer 
[«1(V)]2a2(V), several other different chain associations exist for collagen 
V including hybrid molecules formed with both collagen V and XI chains, 
and strongly suggest a tissue-specific functions for collagen V. The het- 
erotrimeric form, al(V)a2(V)a3(V), only found in placenta [23], remains 
poorly studied until the recent determination of the primary structure of the 
human a3(V) chain, which will henceforth facilitate future research [24]. The 
homotrimeric form [a1(V)]3 was first observed in cell cultures and is likely 
to be present in embryonic tissues, though in too small amounts to be puri- 
fied. Recently, however, using recombinant technology, the molecular charac- 
terization of the collagen V homotrimer and its role in fibril formation have 
been achieved [21, 22]. 

The diversity of the collagen V molecular forms has been recently aug- 
mented with the identification of a novel member of collagen V gene family 
referred to as «4(V) [25]. Whether this chain identified in rat does represent a 
novel chain or corresponds to an orthologue of the mouse and human a3(V) 
chain is not clearly established yet. The rat prow4(V) chain presents a very high 
percentage of identity with the human proa3(V) chain. In fact, the al(V), 
a3(V), a4(V), a1(XI) and a2(XI) are closely related and were proposed to 
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constitute a separate clade among the fibrillar collagen genes [19]. However, 
from the current data, &3(V) chain and a4(V) chains differ by several features. 
Whereas the «3(V) chain is present in placenta as the heterotrimer a1(V)- 
a2(V)a3(V), the a4(V)-containing molecule shows a different heterotrimeric 
chain association [a1(V)]2«4(V)]. Moreover, prow4(V) chain presents a unique 
expression pattern. It was shown to be solely synthesized by Schwann cells and 
its expression is restricted to embryonic and early postnatal peripheral nerves. 
The a4(V)-collagen exhibits particular high affinity (Kp ~ 0.2 nmol/l) for the he- 
paran sulfate transmembrane proteoglycan syndecan 3 [26]. Most importantly, 
this interaction with the cells is mediated by an heparin binding site located in 
the variable region of the N-propeptide, a finding likely important for its bio- 
logical function. This flexible domain that persists on the mature molecule is 
known to be accessible at the heterotypic fibril surface and thus, unlike the 
triple helix probably buried in the fibrils, could interact with receptors at the 
cell surface. A number of collagen V receptors have been already identified 
[27-30], but they all bind to the collagen V major triple helix. Other isoforms 
bind heparin and heparan sulfate, but the binding site was localized within the 
triple helical domain of the a1(V) chain 131, 32]. In addition, the «4(V)-colla- 
gen was shown to display domain-specific effects that regulate peripheral 
nerves development. The «4(V) N-propeptide recombinantly expressed as a 
monomer was shown to stimulate Schwann cell migration through its heparin 
binding site whereas the major triple helix blocked neurite outgrowth [26]. The 
interaction through the N-propeptide domain was shown to induce actin cyto- 
skeleton assembly, tyrosine phosphorylation and activation of the Erk1/Erk2 
protein kinases [33]. 


2.2 
FACITs (Fibrillar-Associated Collagens with Interrupted Triple-Helix) 


2.2.1 
Recent Advances on Collagens IX, XII and XIV 


Beside a common general chain structure, the name of FACIT comes from the 
fact that the first discovered members of this subfamily, collagens IX, XII and 
XIV were shown to interact with collagen fibrils. The lateral interaction of 
collagen IX at the fibril surface through covalent crosslinks was proposed to act 
as a molecular bridge between cartilage fibrils and other matrix components. 
Recent advances on structural and functional aspects of this cartilaginous 
FACIT have been reviewed [34]. Although collagens XII and XIV localized at 
the fibril surface of collagen I- or II-containing fibrils, the detailed molecular 
mechanisms of the interactions remain to be clarified [35, 36]. No covalent 
cross-linking involving these collagen types could be demonstrated and the 
current point of view is that collagens XII and XIV might interact with fibrils 
through the matrix small proteoglycans decorin and fibromodulin [37-39]. 
Functions of collagens XII and XIV have not been fully elucidated but a line of 
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evidence puts forward a role in stabilizing and/or in organizing fibrillar net- 
work in extracellular matrix. Collagen XIV was reported to promote gel con- 
traction [40] and might act as a regulating factor of fibril growth in develop- 
ing tendon [41]. Collagen XII expression is prominently induced in response to 
wounding caused by excess mechanical stress. Its production was three- to five- 
fold higher in fibroblasts cultured in tensed than in released collagen type-I 
gels. The intracellular mechanisms controlling mechano-dependent produc- 
tion of the collagen XII required activation of the FAK-Erk1/2 pathway and of 
a distinct classical or novel PKC. Fibronectin was also up-regulated in cells un- 
der tensile stress but its expression induction is regulated by a different PKC- 
dependent pathway [42]. 


2.2.2 
Structure of the Novel FACITS: Collagens XVI, XIX, XX, XXI and XXII 


Recently, the FACIT sub-class of the collagen family has expanded to 8 mem- 
bers and henceforth includes the novel collagens XVI, XIX, XX, XXI and the 
most recently reported collagen XXII [5]. Although their size and their primary 
structure vary greatly, they all share the general structural features of the pre- 
viously identified FACITs. The key features of this subfamily are signed by two 
highly conserved cysteine residues separated by four amino acids at the NC1- 
COLI junctions and the presence of two G-X-Y triplet imperfections within the 
COLI domain. They also display tremendous divergence in size (from 60 kDa 
to 220 kDa) particularly for their N-terminal NC domain. In addition, alter- 
native splicing of collagens XII, XIV and XIX mRNAs and possibly of collagen 
XX occurs in some species, making their functional studies a difficult task to 
tackle (see [10] for review). Two (collagens XII, XIV, XX and XXI) or more (col- 
lagens IX, XVI, XIX and XXII) short triple helical domains are connected and 
bordered by non collagenous domains (Fig. 2B). The N-terminal non-col- 
lagenous domain diverges significantly among the different FACIT members. 
As indicated in the introduction and in Fig. 2B, collagen domains are com- 
monly numbered from the C-terminus to the N-terminus. However this con- 
vention is not always respected and the reverse order can also be found in the 
literature, complicating the understanding of non-specialist readers. Since the 
number of collagenous domains in FACIT collagen varies, the N-terminal do- 
mains correspond to NC3 in collagens XII, XIV, XX, XXI, to NC4 in collagens 
IX, NC6 in collagens XIX and to NC11 in collagen XVI that contains the high- 
est number of collagenous domains. FACITS usually contain a TSPN domain 
next to the collagenous domain. It represents the sole module constituting the 
N-terminal non collagenous domain of collagens IX, XVI and XIX. 
Additional domains with structural homology to modules found in other 
molecules constitute the N-terminal domain of other FACITs. In collagens XII, 
XIV and XX, von Willlebrand factor A-like domains (vWA) alternate with 
fibronectin type III repeats (FN3), while in collagens XXI and XXII a unique 
vWA domain is placed next to the TSPN domain (Fig. 2B). When particularly 
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large as in collagens XII and XIV, the N-terminal domain folds into a trident- 
like structure as observed by rotary shadowing whereas N-terminal domains 
of collagens IX or XIX formed a large compact globular domain (reviewed in 
[10, 43]). Due to their numerous non-collagenous domains, FACITs are con- 
sidered to be flexible molecules. This is supported by rotary shadowing images 
of the molecules that show several kinks interrupting the rod-like collagenous 
domains [5, 43]. 


2.2.3 
Distribution and Function of the Novel FACITs 


Although some FACITs, notably collagens XVI [44, 45] and XIX [46], were dis- 
covered more than ten years ago, very little is known about the role of the novel 
FACITs in connective tissues. However, the establishment of their expression 
patterns was in some cases instructive and could further help in elucidating 
their functions. In the absence of strong evidence for specific functions, FACITs 
are usually considered to be implicated in the stabilization and integrity of the 
extracellular matrices. 


2.2.3.1 
Collagen XVI 


Collagen XVI is widely distributed in embryonic and adult tissues [47] but its 
supramolecular organization was reported to be tissue-specific [48]. In carti- 
lage, collagen XVI resides in thin fibrils that contain also collagens II and XI, 
an expected localization for a member of the fibril-associated collagen family. 
In skin however, collagen XVI occurs at the dermo-epidermal interface in close 
vicinity to collagen VII, that forms the anchoring fibrils, and in the papillary 
dermis where it co-localizes with fibrillin-1. It is almost absent from the 
deeper layer, the reticular dermis [48-50]. Dermal fibroblasts, and to a lesser ex- 
tent keratinocytes, produce substantial amounts of collagen XVI [50]. Collagen 
XVI is synthesized as a homotrimer and does not undergo further processing 
[51]. Its expression is functionally regulated. It is enhanced during cell growth 
arrest [52] and in fibrotic diseases [49]. Unlike the situation in normal tissues, 
fibroblasts actively migrate and proliferate in fibrotic conditions. Thus, collagen 
XVI was proposed to play a role in stabilizing fibroblasts in dermal 
matrices. 


2.2.3.2 
Collagen XIX 


Based on its structural features and chain organization, collagen XIX belongs 
undoubtedly to the FACIT family. It appears more closely related to collagen 
XVI. However its preferential localization in basement membrane zones remains 
unique for a FACIT member. An homologue of this evolutionary conserved 
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collagen was identified as an adult-specific collagen of C. elegans cuticle [53]. 
Originally isolated from a human rhabdomyosarcoma cell line [46, 54], colla- 
gen XIX localizes in endothelial, neuronal, mesenchymal, and most epithelial 
basement membrane zones in all tissues tested [55]. In developing mouse 
embryos, transient but preferential expression of collagen XIX was detected in 
the myotome and coincides with the myogenic transcription factor myf-5 [56]. 
In addition, up-regulation of collagen XIX in rhabdomyosarcoma cells has been 
correlated with myogenic differentiation [57], raising the possibility that col- 
lagen XIX is involved in initial stages of skeletal muscle cell differentiation. Very 
recently, collagen XIX null mice provided unexpected insights regarding the 
specific role of collagen XIX in morphogenesis. Null mice showed perinatal 
lethality and the analysis of their phenotype revealed that collagen XIX is 
required for esophageal muscle transdifferentiation [58]. This finding is in 
good concordance with the previously established expression pattern of col- 
lagen XIX in developing embryos. Notably, collagen XIX expression was con- 
fined to few sites including the smooth muscle layers of the stomach and 
esophagus [56]. 


2.2.3.3 
Collagens XX, XXI and XXII 


Chick collagen XX is very closely related to collagen XIV and found to be 
localized to corneal epithelium, a pattern very close to that of collagen XII [59]. 
In contrast, collagen XIV was found in corneal stroma revealing a role in 
corneal compaction [60]. 

Phylogenic analysis showed that collagen XXI is closely related to collagens 
XIL XIV and XX [61]. The expression of collagen XXI is developmentally regu- 
lated, exhibiting higher level in fetal stages [62]. It localized in several tissues in- 
cluding heart, stomach, kidney, muscle and placenta [63]. More specifically, col- 
lagen XXI is an abundant component of the extracellular matrix of vascular 
walls and is produced by smooth muscle cells. Its expression is stimulated by 
platelet-derived growth factor, indicating that collagen XXI might contribute to 
matrix assembly of vascular network during blood vessel formation [62]. 

The very recently reported collagen XXII was presented as a novel marker 
of tissue junctions because of its unique tissue distribution. Collagen XXII 
immunodetection is restricted to the myotendinous and articular cartilage- 
synovial junctions and to the confined zone between the anagen hair follicle 
and the dermis [5]. Like other novel FACIT members, this collagen is believed 
to interact with components of microfibrils rather than with classical collagen 
fibrils. Tissue junction formation has been poorly investigated because of a 
lack of a good marker of this specialized region. It is likely that this newly iden- 
tified collagen will represent an excellent marker to elucidate mechanisms of 
junction formation during development and regeneration. 
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2.3 
Membrane Collagens and Collagen-Like Proteins 


2.3.1 
Common Features 


The structure and functions of the currently known collagenous transmem- 
brane proteins have been recently reviewed, with a particular emphasis on 
collagen XVII as a ‘prototype’ of the group [64]. We will thus focus on charac- 
teristic features of this intriguing collagen subfamily. 

Membrane collagens are type II transmembrane proteins with the N-termi- 
nus inside the cell. This rare orientation occur only in 596 of membrane proteins. 
Membrane collagens contain a single pass hydrophobic transmembrane domain 
and include collagens XIII, XVII, XXIII, XVII, and membrane collagen-like pro- 
teins corresponding to ectodysplasin, MARCO (macrophage receptor with 
collagenous structure) and macrophage scavenger receptors [65]. The folding of 
the triple helix in the ectodomain of collagens XIII [66] and XVII [67] proceeds 
from the N- to the C-terminus, in opposite orientation to that of the fibrillar 
collagens. Membrane collagens XIII, XXIII and XXV contain two separate 
coiled-coil motifs, which may function as independent oligomerization do- 
mains [66, 68, 69]. Similar domains occur in the collagen-like membrane pro- 
teins MARCO and ectodysplasin-A1 [68]. A high identity was found across the 
20-amino acid C-terminal non collagenous domain of membrane collagens 
XIII, XXIII and XXV [6]. 

Collagenous transmembrane proteins function as cell surface receptors and 
soluble matrix molecules shed from the cell surface remaining trimeric after 
their release in the extracellular space. The cleavage generally occurs close to 
the extracellular face of the membrane. Ectodomain cleavage at a furin-like site 
has been reported for collagens XIII [66], XVII [70, 71], XXIII [6], XXV [7] and 
ectodysplasin [72]. Shed ectodomains of membrane collagens may have a struc- 
tural and regulatory role and are available for binding to other components of 
extracellular matrix, such as fibronectin, nidogen-2 and perlecan as reported for 
collagen XIII [73] or to receptors such as integrins («11 for collagen XIII, «581 
and a V1 integrins for collagen XVII 174, 75]). In the same way, shed ectodys- 
plasin binds to its receptor, the EDAR protein [72]. The ectodomains of collagens 
XIII [73] and XXIII [6] bind to heparin, which inhibits the shedding of collagen 
XIII in vitro possibly by masking the arginin-rich cleavage site [73]. The release 
of the ectodomain of collagen XVII from the cell surface is associated with 
altered keratinocyte motility in vitro [71]. Shedding may also be involved in 
diseases since the shed ectodomain of collagen XVII is targeted by auto-anti- 
bodies in different blistering skin diseases [76]. 
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2.3.2 
Membrane Collagens 


2.3.2.1 
Collagen XIII 


Type XIII collagen is a homotrimeric transmembrane collagen composed of 
a short intracellular domain, a single membrane-spanning region, and an 
extracellular ectodomain with three collagenous domains separated by short 
non-collagenous domains. Most of the type XIII collagen ectodomain appears 
to be present in triple helical conformation [77]. Alternative splicing of colla- 
gen XIII affects not only non collagenous sequences but also collagenous 
domains COLI and COL3 [78]. Collagen XIII is a component of epidermal 
cell-matrix and cell-cell contacts in cultured keratinocytes [79]. It is concen- 
trated in cultured skin fibroblasts and several other human mesenchymal cell 
lines in the focal adhesions and it is found in a range of integrin-mediated ad- 
herens junctions in tissues [80]. It is likely involved in the adhesion of cells to 
basement membranes [73] and seems to participate in the linkage between 
muscle fiber and basement membrane, a function impaired by the lack of the 
cytosolic and transmembrane domains [81]. Abnormal adherence junctions in 
the heart have been observed in transgenic mice overexpressing mutant type 
XIII collagen, suggesting that it also has an important role in certain adhesive 
interactions that are necessary for normal development [82]. 


2.3.2.2 
Collagen XVII 


Collagen XVII, also referred to as bullous pemphigoid antigen 180 or BP180, is 
a keratinocyte transmembrane protein, which attaches the epidermis to the 
basement membrane in the skin [64, 83]. It is a major component of hemides- 
mosome, where it exists as a full-length protein and interacts with hemidesmo- 
somal components BP230, plectin and the integrin «604 [84]. Its extracellular 
domain is made of 15 collagenous domains interspersed in 16 non-collagenous 
domains. Its intracellular domain encompasses one third of the molecule and 
is longer than that of other membrane collagens [64]. The largest collagenous 
domain of type XVII collagen, COL15, promotes adhesion of epithelial cells 
and fibroblasts and is targeted by autoantibodies in bullous pemphigoid dis- 
eases [85]. 


2.3.2.3 
Collagen XXIII 


Collagen XXIII is a transmembrane collagen identified in metastatic tumor 
cells. It contains only 540 amino acids and consists of an amino-terminal 
cytoplasmic domain, a transmembrane region, and three collagenous domains 
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flanked by short noncollagenous domains [6]. It is expressed in normal human 
heart and retina and its expression is up-regulated in certain metastatic 
cancer cells [6]. The function of this collagen is still unknown. It may contribute 
to cell adhesion since it has one RGD site and three KGD motifs, which are 
involved in cell adhesion and migration on collagen XVII via integrins [75]. 
Collagen XXIII is able to bind heparin with low affinity and might interact with 
heparan sulfate chains either on the cell surface or within the extracellular 
matrix [6]. 


2.3.2.4 
Collagen XXV/CLAC-P (Collagen-Like Alzheimer Amyloid Plaque Component 
Precursor) 


Collagen XXV is the most intriguing membrane collagen. It has been discovered 
in a search for no-amyloid B components of plaque amyloid in the brain of 
patients with Alzheimer's disease. Monoclonal antibodies were raised against 
senile plaque amyloid. The antigen corresponding to the antibody, which binds 
amyloid fibrils, was purified and the corresponding cDNA cloned [7]. The anti- 
gen was named CLAC and its precursor CLAC/P or collagen XXV. It is a short 
collagen containing 654 amino acids. It is made of three G-X-Y repeat domains 
flanked by four non collagenous domains. It is expressed specifically in neurons, 
but not in astrocytes, microglia or meningeal fibroblasts. It is also expressed, al- 
beit at low level, in heart, testis and eye [7]. As reported for collagen XIII in other 
cell types [82], collagen XXV may play a role in adherens junction between neu- 
rons [7]. Its involvement in amyloid deposits in Alzheimer's disease will be dis- 
cussed below. 


2.3.3 
The Membrane Collagen-Like Proteins 


Ectodysplasin A, MARCO and macrophage scavenger receptors belong to this 
group. 


2.3.3.1 
Ectodysplasin A 


The two longest isoforms of ectodysplasin (A1 and A2), which are comprised 
of 391 and 389 amino acids respectively, are collagenous trimeric membrane 
proteins with a extracellular domain made of 19 repeats of G-X-Y (with an 
interruption between repeats 11 and 12) and of a C-terminal tumor necrosis 
factor-like domain [86]. Ectodysplasin co-localizes with cytoskeletal struc- 
tures at lateral and apical surfaces of cells [87]. It is a signaling molecule 
required for epithelial morphogenesis [88,89]. Mutations in the ectodysplasin 
gene cause X-linked anhidrotic ectodermal dysplasia, a human genetic disor- 
der of impaired ectodermal appendage development (absence or hypoplasia 
of hair, teeth and sweat glands [90]). 
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2.3.3.2 
Scavenger Receptors Type A and MARCO 


The scavenger receptors type A (SR-AI and SR-AII) and MARCO are classical 
type scavenger receptors that have collagen-like domains. Class A scavenger 
receptors are transmembrane glycoproteins that mediate both ligand inter- 
nalization and cell adhesion and participate in lipid metabolism [91]. They are 
comprised of a short cytoplasmic domain, a transmembrane domain, an ox 
helical coiled-coil domain, a spacer domain and a triple helical domain [92]. In 
addition, a scavenger receptor cystein-rich domain is present at the C-termi- 
nus of SR-AI. The overall structure of MARCO resembles that of SR-AI but 
the o helical coiled-coil domain is lacking [92-94]. MARCO is involved in the 
anti-bacterial host defense. 


2.3.3.3 
Membrane-Type Collectin from Placenta (Collectin Placenta-1) 


This protein contains a coiled-coil region, a collagen-like domain and a carbo- 
hydrate recognition domain. However, it differs from other collectins in being a 
type II membrane protein [95] and resembles type A scavenger receptors where 
the scavenger receptor cysteine-rich domain is replaced by a carbohydrate recog- 
nition domain. Collectin placenta-1, which can bind and phagocytose not only 
bacteria but also yeast, might play important roles in host defenses that are 
different from those of soluble collectins in innate immunity [96]. 


24 
Multiplexins 


2.4.1 
Common Features 


Collagens XV and XVIII are homotrimers with strong sequence and structural 
homologies in the C-terminal part of the molecule. They contain multiple triple 
helical domains (9 for collagen XV and 10 for collagen XVIII) hence their name 
Multiplexins (Multiple Triple Helix with interruptions). They both have at 
their N-terminus a thrombospondin-1 domain similar to the N-terminal he- 
parin-binding domain of thrombospondin-1 [97], which is also found in some 
fibrillar collagens and in FACITs. C-terminal heptad repeats have been found 
in the oligomerization domain of the multiplexins [69]. 

Both collagens house functional C-terminal domains, endostatin-XVIII and 
endostatin-XV (also called restin), which are released from the parent molecule 
by proteolysis as discussed below and share 6196 sequence identity. They are 
both hybrid molecules carrying glycosaminoglycan chains. Collagen XV and 
XVIII are proteoglycans bearing respectively chondroitin sulfate and heparan 
sulfate chains [98, 99]. Both collagens occur in the epithelial and endothelial 
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basement membrane zones of a wide variety of tissues [100]. However, double 
knockout mice reveal a lack of major functional compensation between 
collagens XV and XVIII. Their biological role are essentially separate, that of 
collagen XV in the muscle and that of collagen XVIII in the eye [101]. 


2.4.2 
Collagen XV 


The human a1(XV) collagen chain contains a large amino-terminal non-triple 
helical domain with a tandem repeat structure located in the C-terminal part 
of the NC10 domain. The 45-amino acid residue sequence, which is not found 
in collagen XVIIL is repeated four times and is similar to rat cartilage proteo- 
glycan core protein [102].Type XV collagen occurs widely in the basement 
membrane zones of tissues [103], but its function is not fully elucidated. Lack 
of type XV collagen in mice results in mild skeletal myopathy and increases vul- 
nerability to exercise-induced skeletal muscle and cardiac injury. Collagen XV 
appears to function as a structural component needed to stabilize skeletal mus- 
cle cells and microvessels [104]. It has been recently hypothesized that collagen 
XV might be a tumor suppressor [105]. 


2.4.3 
Collagen XVIII 


Unlike collagen XV, collagen XVIII is an heparan sulfate proteoglycan and the 
first reported collagen with a heparan chain [99]. It contains ten domains of 
triple-helical collagenous repeats separated by non collagenous domains [106]. 

Collagen XVIII is localized in perivascular basement membrane zones and 
one of its physiological roles may be to maintain the structural integrity of base- 
ment membranes. Indeed, its C-terminal fragment endostatin is often co-local- 
ized with perlecan. Perlecan might serve as an adaptator molecule for endostatin 
via its C-terminal domain endorepellin, which binds to endostatin [107], and 
could connect collagen XVIII to the basement membrane scaffold [108]. The role 
of collagen XVIII in basement membrane assembly is strengthened by the pres- 
ence within its C-terminal fragment of binding sites for heparan sulfate and 
laminin-1 [109-111] and by the ability of endostatin to bind other components 
associated to basement membranes, fibulin-1, fibulin-2 and nidogen-2 [112]. 

A knockout of the collagen XVIII gene in mouse has shown delayed regres- 
sion of blood vessels in the vitreous body and consequently impaired outgrowth 
of the retinal vessels, suggesting that collagen XVIII plays a critical role in the 
development and function of the eye, specially for normal induction of blood 
vessel formation. Collagen XVIII has been shown to be important for anchor- 
ing vitreal collagen fibrils to the inner limiting membrane [113, 114]. This 
role is further supported by the finding that mutations in COL18A1 lead to an 
autosomal recessive disorder called Knobloch syndrome, which is defined by 
the occurrence of vitreoretinal degeneration with retinal detachment, high 
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myopia, macular abnormalities and occipital encephalocele. The frameshift 
mutations are located in different regions of the gene [115, 116]. Collagen XVIII 
has thus a major role in determining the retinal structure as well as in the 
closure of the neural tube. 

Collagen XVIII seems also to have a physiological role during organogene- 
sis. It participates in epithelial branching morphogenesis particularly in the 
lung and the kidney [117]. It may be involved in various developmental 
processes through the Wnt signaling pathway via its N-terminal frizzled domain 
[118]. Indeed the longest N-terminal collagen XVIII variant contains a cysteine- 
rich region homologous to the extracellular domain of frizzled receptors, which 
bind the Wnt signaling molecules. In contrast, the C-terminal fragment of the 
collagen XVIIII molecule, endostatin, has been reported to be a potential in- 
hibitor of Wnt signaling [119]. The structure and functions of endostatin will 
be detailed in section 3.6. Studies performed in Caenorhabditis elegans have 
shown that collagen XVIII is associated with axons and enriched near synaptic 
contacts and participates in synapse organization via its C-terminal NCI 
domain [120] as discussed below. 


2.5 
Collagen XXVI and the Emu Family 


The members of the Emu (Emilin and Multimerin) family share an N-termi- 
nal cysteine-rich domain (EMI domain). Four of these proteins, emilinl, 
emilin2, emilin and multimerin-domain containing proteins emul and emu2, 
which are secreted and deposited in the extracellular matrix, contain at least a 
collagenous domain [121]. 


2.5.1 
Emilin1 and Emilin2 


Emilins (Elastin Microfibril Interface Located proteINs) are extracellular matrix 
glycoproteins that are associated with elastic fibers at the interface between 
amorphous elastin and microfibrils [122]. Both emilins possess a signal peptide, 
a N-terminal cysteine-rich domain, a long coiled-coil region and a short col- 
lagenous sequence followed by a C1q globular domain [122]. The collagenous 
domain of emilinl is comprised of 17 uninterrupted G-X-Y triplets and is able 
to form a triple helix. In emilin2 the 17 triplets have four interruptions and it is 
unlikely that this sequence could form a triple helix. 


2.5.2 
Emilin and Multimerin-Domain Containing Proteins (Emu1 and Emu2) 


These proteins show a similar structural organization comprised of an N-ter- 
minal signal peptide followed by a EMI domain, two collagen stretches and a 
conserved C-terminal domain of unknown function. These glycosylated proteins 
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are capable of forming homo- and heterotrimers via disulfide bonding. Alter- 
native splicing gives rise to two isoforms, differing by 2 amino acids, of emul 
and emu2 [121]. It has been suggested that emul and emu2 might be incorpo- 
rated into collagen fibers as macromolecular connectors, similar to FACITS, with 
the EMI and the C-terminal domains available for further interactions [121]. 


2.5.3 
Collagen XXVI is Identical to the Emu2 Protein 


The shortest variant of murine emu2 is 438 amino acid long and contains two 
collagenous domains corresponding to 23 and 11 G-X-Y repeats respectively. 
This protein is identical to the x 1(XXV1) collagen chain discovered indepen- 
dently by yeast two-hybrid screening of a 17.5 whole mouse embryo cDNA 
library using the collagen-specific molecular chaperone HSP47 as a bait [8]. 
Collagen XXVI is expressed in testis and ovary during the development and 
also in adults. It has not been assigned so far to a collagen group due to a lack 
of extensive structural similarities with existing sub-families. It is comprised of 
two collagenous domains and thus is not a fibrillar collagen. It does not appear 
to be a FACIT because it does not contain either the conserved motif charac- 
terized by the presence of two cysteine residues at the COL1-NCI junction found 
in other FACITs or a thrombospondin N-terminal-like domain. In contrast, col- 
lagen XXVI possesses two coiled-coil oligomerization domains found in mem- 
brane collagens XIII, XXIII and XXV, one of them being homologous to that of 
the membrane collagen XIII [68] but it does not contain a transmembrane se- 
quence and its ectodomain is not cleaved by furin convertase despite the pres- 
ence of a consensus furin protease cleavage site [8]. Furthermore, this protein 
has a unique feature relative to other collagens. Whereas in fibrillar collagens, 
the three o chains are associated in the C-propeptide region where they form 
intermolecular disulfide bonds, the disulfide bonds that form trimers of colla- 
gen XXVI are located in an N-terminal non collagenous domain [8]. 


2.6 
Collagen-Like Molecules 


Type II membrane collagen-like molecules have been discussed above. This 
section deals with molecules containing a collagenous domain, but which are 
not integral membrane proteins. Although some of the molecules listed below 
are sometimes referred to as collagens, they are not assigned a Roman number. 
A letter has been used for the collagenous domain (collagen Q) of acetyl- 
cholinesterase [123]. 

Within this group, a number of proteins contains both a sequence comprised 
of G-X-Y repeats and a C-terminal globular domain similar to that found in the 
complement component Clq (see [10-12] for reviews). They include Clq [124], 
chipmunk hibernation-related proteins, an inner ear-specific structural protein 
also referred to as saccular collagen [10], the adipocyte-derived hormone 
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adiponectin [125], Clq-related factor which is not secreted but expressed 
exclusively in cytoplasm of cells [126], and a novel short-chain collagen called 
complement Clq tumor necrosis factor-related protein 5 (CQT5) [127]. Muta- 
tion in the Clq domain of this protein results in late-onset retinal degeneration, 
an autosomal dominant disorder with striking clinical and pathological simi- 
larities to age-related macular degeneration [127]. 

Other collagen-like molecules are collectins and ficolins, the humoral lectins 
of the innate immune defense. They possess a collagenous region and a carbo- 
hydrate recognition domain, which is a C-type lectin domain in collectins and 
a fibrinogen-like domain in ficolins [95]. There are four groups of collectins: the 
mannan-binding protein group, the surfactant protein A group, the surfactant 
protein D group, collectin liver 1 and a membrane-type collectin, collectin pla- 
centa 1, that functions as a scavenger receptor (see above). The human ficolins 
are L-ficolin, M-ficolin and H-ficolin. These molecules are sometimes referred 
to as defense collagens because they bind complex glycoconjugates on mi- 
croorganisms, thereby inhibiting infection, enhancing the clearance by phago- 
cytes and modulating the immune response [65]. Soluble defense collagens 
include the complement C1q, collectins, ficolins and adiponectin. Membrane- 
bound defense collagens are the type A macrophage scavenger receptors and 
MARCO [65], which were described earlier. 

Genome-based identification and analysis of collagen-related structural 
motifs in bacterial and viral proteins have been recently performed [128]. It is 
worth noting that bacterial proteins containing collagen-related structural 
motifs, which are usually found as cell surface or spore components, can form 
collagen-like triple helices despite the inability of bacteria to synthesize hy- 
droxyproline. Threonine in the Y position of the G-X-Y repeat can substitute 
for hydroxyproline in stabilizing the triple helix [129]. 


3 
Non Collagenous Domains of Collagens 


Comprehensive reviews on the structure of extracellular modules including fi- 
bronectin type III and von Willebrand factor type A domains [1] and on their 
organization in extracellular proteins have been published [2]. The functional 
role of A-domains in collagen VI has been previously reviewed [130]. We will 
deal in this section with the structure of individualized non triple-helical do- 
mains. We will mostly focus on the possible functions of the Cystein-Rich 
Repeat and of thrombospondin-1 domains and on the molecular structure of 
the NC1 domain of collagens IV, VIII, X, XV and XVIII that have been eluci- 
dated in the last years. These structures gave insights into the structural basis 
for collagen molecular and supramolecular assembly and into their biological 
functions. In addition to functions attributed to individual domains, new func- 
tional roles have been reported for a number of proteolytic fragments derived 
from the C-terminal domain of collagens IV, VIII, XV and XVIII found within, 
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or in the vicinity of, basement membranes. These enzymatic fragments contain 
exposed matricryptic sites and are collectively referred to as matricryptins 
[131]. The matricryptins issued from collagens IV, VIII, XV and XVIII inhibit 
angiogenesis and tumor growth [132-135]. Among these molecules, 
endostatin and tumstatin issued from the C-terminal part of x (XVIII) and 
a3(IV) chains respectively, are currently extensively investigated. 


3.1 
Cystein-Rich Repeat Domain 


CRRs are found in a number of collagens and other proteins and are charac- 
terized by a signature sequence of ten cysteine residues in their sequence (see 
[136] for review). Physical studies indicated that all ten cysteine residues are 
disulfide-bonded likely conferring to the domain its stability [137, 138]. This 
domain is present in multiple copies in two homologous proteins the Xenopus 
chordin and the drosophila sog which bind to members of the TGF-$ super- 
family, BMP-4 and decapentataplegic respectively. The release of these growth 
factors by xolloid metalloproteinase activity establishes a gradient of available 
morphogens that play a crucial role in dorsal-ventral patterning [139, 140]. 
Particularly interesting has been the speculation of a functional connection 
between CRR domains of chordin and collagen IIA. Specific binding of BMP- 
2 and TGF-f1 to recombinant trimeric collagen IIA N-propeptide was clearly 
established and strongly suggested a role of the collagen IIA N-propeptide in 
the regulation of growth factor delivery during chondrogenesis [141]. This 
finding was supported by the recent observation that the Xenopus collagen IIA 
N-propeptide also bound BMP-4 [142]. However, it is too early to speculate that 
all CRR-containing fibrillar collagens may have similar role in developmental 
processes. As a result, mice that harbored a deletion of the col1a1 exon 2 en- 
coding the collagen I CRR-containing domain developed quite normally [143]. 
Whether collagen III N-propeptide compensated the lack of collagen I NC3 do- 
main or CRR repeats were necessary to bind growth factors efficiently has to 
be clarified. It has been shown that individual chordin CRRs are less effective 
than the four repeats present in the intact molecule. Unlike the homotrimeric 
collagen IIB, collagen I contain only two CRR repeats since the third chain 
present in the hetrotrimeric molecule a2(I) lacks this domain. Beyond the role 
of CRR domains in collagens, this finding also questions the exact role of the 
N-propeptide of collagen I, which was previously implicated in a number of 
different collagen biogenesis events (molecule assembly, feedback regulation of 
procollagen synthesis, fibrillogenesis) recently reviewed [136]. 


3.2 
Thrombospondin-1 Domain 


The TSPN domain (~200 residues) first described in thrompobondin-1 is 
present in some fibrillar collagens, in all members of the so-called FACITs sub- 
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class of collagens and in the multiplexins (see above). The thrombospondin 
motif occurring in collagens corresponds to the N-terminal heparin-binding 
domain of thrombospondin, but the amino acid residues involved in heparin 
binding are not conserved in the collagens. 

As described earlier in fibrillar collagens the NC3 domain of the TSPN- 
containing chains also possesses a variable region that substantially diverges 
in size and in primary sequence from one chain to another. At the time of this 
writing, the three dimensional structure of the NC3 domain has not yet been 
solved. However, rotary shadowing of recombinant of &1(XI) N-terminal do- 
main showed distinct structural organization. The TSPN domain formed a 
compact globule, whereas the variable region formed an extended tail [144]. 
The TSPN domain was shown to be released during N-terminal maturation 
of collagens V and XI but its function as a free released polypeptide or when 
attached to the procollagens has not been elucidated yet. In point of fact, very 
little is known about the structure and the function of this domain. A spon- 
taneous mutation in human COL5A1 gene resulting in the deletion of exon 
5 encoding the sequence encompassing the BMP-1 cleavage of the N-propep- 
tide caused a connective-tissue disorder, the classical Elhers-Danlos syn- 
drome [145]. Such deletion might abolish the release of TSPN that normally 
occurred in collagen V maturation and consequently disturbed normal fib- 
rillogenesis. This observation might represent a novel interesting lead to 
tackle TSPN function. 


3.3 
Kunitz Domain of the «3(VI) Chain 


This domain is located at the C-terminus of the «3(VI) chain. It shows 40-50% 
identity to a variety of serine protease inhibitors of the Kunitz-type [146], but 
lacks any inhibitory activity against trypsin, thrombin, kallikrein and several 
other proteases. Its crystal structure has been determined [146-148] and is very 
close to that of all other members of the Kunitz family. Two e helices border a 
twisted antiparallel double-stranded $-structure [146]. The complete domain 
was shown by NMR to bea highly dynamic molecule in solution, some 44% of 
its main body structure showing multiple conformations [149]. 


3.4 
NC1 Domains of Collagen IV 


Triple-helical collagen IV molecules associate through their N- and C-termini 
forming a three-dimensional network, which provides basement membranes 
with an anchoring scaffold and mechanical strength. The six chains of collagen 
assemble into three different triple helical protomers and self-associate as three 
distinct networks [150]. The specificity of both molecules and network as- 
sembly is governed by amino acid sequences of the C-terminal noncollagenous 
NCI domain (~230 residues) of each chain [151]. 
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The crystal structure of the [a1(IV]2a2]2 NC1 hexamer of bovine lens 
capsule has been determined at 2.0 À resolution. This was the first hexameric 
structure to be solved and it provided the structural basis for supramolecular 
assembly of collagen IV. The football-shaped hexamer is made up of two iden- 
tical trimers, each containing two ol chains and one a2 chain. The hexamer 
structure is stabilized by the extensive hydrophobic and hydrophilic interac- 
tions at the trimer-trimer interface without a need for disulfide cross-linking 
[152]. The NC1 monomer folds into a novel tertiary structure with predomi- 
nantly B-strands. The [a1(IV]2a2]2 trimer is organized through the unique 
three-dimensional swapping interactions. The 1.9-À crystal structure of the 
NCI domain of human placenta collagen IV shows stabilization via a covalent 
Met-Lys cross-link [153]. This hexameric NCI particle is composed of two 
trimeric caps, which interact through a large planar interface (Fig. 3). Each cap 
is formed by two a1 fragments and one a2 fragment with a similar previously 
uncharacterized fold, segmentally arranged around an axial tunnel. Each 
trimer forms a quite regular, but nonclassical, sixfold propeller. The trimer- 
trimer interaction is further stabilized by a previously uncharacterized type of 
covalent cross-link between the side chains of a methionine and a lysine residue 
of the a1 and a2 chains from opposite trimers [153]. Three methionine-93 side 
chains from each cap are cross-connected with the three Lys-211 side chains 
from the opposite caps. 


trimer B trimer A 


Fig.3 The human NCI hexamer of collagen IV. Ribbon plot of the NC1 hexamer viewing 
along the pseudoexact twofold axis. The two a1(IV) chains and the a2(IV) chain of each 
trimer are shown in red (a111), blue (112) and gold (a2). The intrachain disulfides are 
indicated by their bridging sulfur atoms (large yellow balls) and the covalent cross-links 
between Met-93 and Lys-211 are depicted as ball-and-stick representations. Reprinted with 
permission from [153]. (Copyright 2002 National Academy of Sciences, USA) 
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The NCI domain of a1(IV), a2(IV) and a3(IV) chains contains proteolytic 
fragments called respectively arresten, canstatin and tumstatin, which partic- 
ipate in the control of tumor growth and angiogenesis [133, 134]. 


3.5 
NC1 Domains of Network-Forming Collagens (VIII and X) 


Collagens VIII and X are short chain collagens, which share with adiponectin 
(see above) a common structural organization: a N-terminal domain, a triple he- 
lical domain and a C-terminal globular NC1 domain (~160 residues) with a sig- 
nificant sequence homology to the Clq domain of the Clq subunits [154-157]. 
The determination of the structure of the Clq domain of collagens VIII and X 
have given insights into the structural basis for their molecular and supramol- 
ecular organization in tissues and, in the case of collagen X, in the mechanisms 
underlying genetic defects (Schmid metaphyseal chondrodysplasia, see below). 

The Clq-like C-terminal NC1 domain of collagens VIII and X form stable 
trimers and are crucial for assembly of collagens VIII and X both at the molecu- 
lar and supramolecular levels. The driving force for the assembly was suggested 
to be in the aromatic-rich terminal domain. The Clq domain forms a tightly 
packed B-sandwich structure related to the tumor necrosis factor fold (Fig. 4 A,B). 


N-term 


Fig.4 A Structure of the human collagen X NCI trimer. Cartoon representation of the NC) 
trimer viewed down the crystallographic threefold axis. The D strands in one subunit are 
labeled A; A’, B’ and C-H.Ca”* ions are represented as pink spheres. B As in A, but rotated by 
90° about the horizontal axis. Reprinted from Structure (Canb). 10, Bogin O, Kvansakul M, 
Rom E, Singer J, Yayon A, Hohenester E. Insight into Schmid metaphyseal chondrodysplasia 
from the crystal structure of the collagen XNC1 domain trimer, 165-173, Copyright (2002), 
with permission from Elsevier 
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The overall structures of the NC1 trimer of collagens VIII and X are quite 
similar. Each subunit of the NCI domain of collagens VIII and X consists of 
a ten-stranded B-sandwich with jellyroll topology 1158, 159]. The structure 
of collagen X NCI trimer, which has been solved at 2.0 À resolution, reveals an 
intimate trimeric assembly strengthened by a buried cluster of four calcium 
ions. The compaction of loops around the Ca?* cluster is likely to contribute 
significantly to the stability of the trimer. Three strips of exposed aromatic 
residues on the surface of collagen X NC1 trimer, forming an aromatic patch, 
are likely to be involved in the supramolecular assembly of collagen X [158]. 
Equivalent strips exist in the NC1 domain of the closely related collagen VIII, 
suggesting a conserved assembly mechanism at the supramolecular level [159]. 
In contrast, the buried calcium cluster present in the collagen X NC1 trimer is 
not found within the collagen VIII NC1 trimer [159]. 

The Clq domain of adiponectin is an asymmetrical trimer of B-sandwich 
protomers, each of which also has a ten-strand jelly-role folding topology iden- 
tical to TNF-a. The stable trimers formed by the Clq domain are stabilized by 
a central hydrophobic interface [160]. 


3.6 
Endostatins XVIII and XV 


Endostatin is a 20-kDa proteolytic fragment derived from the last 183 amino 
acids of the non-collagenous C-terminal domain of collagen XVIII and will be 
further referred to as endostatin-XVIII. It inhibits endothelial cell proliferation 
and migration, induces apoptosis and inhibits angiogenesis and tumor 
growth 1161, 162]. Endostatin interacts with the op, integrin, and to a lesser 
extent to a,B; and a,f; integrins on the surface of endothelial cells. The 
significance of the integrin binding to endostatin may be to concentrate 
endostatin at the sites of neovascularization to function as a highly efficient 
angiogenesis inhibitor by a currently unknown mechanism [163]. Cellular 
actions and signaling by endostatin are complex issues and f-catenin has been 
identified as a potential target [164, 165]. Endostatin has entered phase I and 
II clinical trials in patients with solid tumors (phase I) and neuroendocrine 
tumors, metastatic and non-amendable for tumor resections (phase II) (see 
http://cancertrials.nci.nih.gov). Endostatin might also be involved in tumori- 
genesis in another way, because a coding single nucleotide polymorphism 
(D104 N) in this fragment predisposes for the development of prostatic ade- 
nocarcinoma [166]. 

The crystal structure of murine [167, 168] and human [169] recombinant 
endostatin has been solved. Endostatin exhibits a compact fold distantly related 
to the carbohydrate recognition domain of mammalian C-type lectins and to 
the hyaluronan link module. Both molecules are centered on a seven-stranded 
B-sheet and contain loops and 2 ox helices, the «1 helix being located on 
one side of the B-sheet and the o? helix on the other side 1167, 169] (Fig. 5A). 
Endostatin contains two disulfide bridges in a nested pattern [167]. Structural 
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Fig.5 A Structure of human endostatin-XVIII. B-strands (cyan) are labeled in sequential 
order A-P, a helices are violet and connecting loops are pink. Residues 1-6 are blue; zinc is 
a black circle. Reprinted with permission from [169]. B Structure of endostatin-XV. Cartoon 
drawing with $-strands in light blue, a-helices in pink and disulfide bridges in yellow. 
The polypeptide chain termini are indicated and $-strands are labeled A-P. Reprinted with 
permission from [112]. (Copyright 1998 National Academy of Sciences, USA) 


information has provided useful hints regarding the heparin-binding site of 
endostatin. Eleven of the 15 positively charged arginine residues cluster on 
one face of the molecule and, as further investigated by site-directed mutage- 
nesis, this extensive basic patch contains the primary and secondary binding 
sites to heparin [110]. Human endostatin was found by X-ray crystallography 
to contain a zinc binding site, which is located at the N-terminus [169]. The 
coordination varies depending on the crystal [168], but zinc seems to be 
required for endostatin interaction with heparin/heparan sulfate [170] and for 
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its anti-angiogenic activity [171] at least in some cases. The modulation of 
angiogenesis might not be the major physiological role of endostatin [172], 
which may participate in the anchoring of collagen XVIII to basement mem- 
brane as discussed above. Its effects are not restricted to endothelial cells since 
endostatin has been shown to inhibit the proliferation of tumor cells [170, 173]. 
Furthermore, the NCl/endostatin domain regulates cell migration and axon 
guidance in Caenorhabditis elegans [174]. The involvement of this domain 
in neurogenesis is supported by the fact that the deletion of the NC1 domain 
results in defects in synapse organization and function, suggesting a role for the 
NCI domain in the organization of neuromuscular junctions in Caenorhabdi- 
tis elegans [120]. 

Endostatin-XV, also called restin (for related to endostatin), is the C-termi- 
nal proteolytic fragment of collagen XV. Its overall fold [112] is very similar to 
that of endostatin-XVIII [167]. Like endostatin-XVIII, it contains 2 a helices, 16 
B strands and 2 disulfide bridges [112] (Fig. 5B). The face of the molecule that 
binds heparin in endostatin-XVIII is considerably less basic in endostatin-XV, 
explaining why endostatin-XV does not bind heparin [112]. In contrast to en- 
dostatin-XVIII, endostatin-XV does not bind zinc. Thus, despite a high sequence 
identity, endostatins derived from collagens XV and XVIII differ in structural 
and binding properties, tissue distribution and anti-angiogenic activity [112]. 
Endostatin-XV inhibits bovine aortic endothelial cell migration and prolifer- 
ation and causes cell apoptosis [175, 176]. 


4 
Collagen-Related Diseases and Prospects for Gene Therapy 


Collagen-related diseases comprise genetic and acquired disorders. For com- 
prehensive reviews on autoimmune diseases associated with collagens, the 
reader is referred to reviews on the pathogenesis of Goodpasture's syndrome 
[150, 177] and on the subepidermal blistering diseases associated with an im- 
mune response to collagens VII [178] and XVII [83, 179]. Collagen genes and 
their corresponding disorders have been recently reviewed 111, 12, 180] and we 
will not discuss all of them. 

The following section focuses on recently reported mutations (collagens VIII 
and XVIII) and on genetic diseases for which results concerning gene or 
cell therapies have been reported because these approaches hold considerable 
promising for heritable diseases where no effective treatment is currently 
available. Another field of intense investigation for gene therapy concerns the 
anti-angiogenic treatment of cancer with endostatin. Several studies have 
looked into the different possible ways in which endostatin may be adminis- 
tered including gene therapy and cell encapsulation therapy [181-183]. 
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4.1 
Osteogenesis Imperfecta 


Osteogenesis imperfecta is a varied group of genetic disorders that lead to 
diminished integrity of connective tissues as a result of alterations in the genes 
that encode for either the proal or proa2 chain of collagen I. Attempts at gene 
and cell therapies have been performed using marrow stromal cells [184]. Cur- 
rent investigational strategies to treat osteogenesis imperfecta include ap- 
proaches for skeletal gene therapy and in particular the combined use of genetic 
manipulation and cellular transplantation. Somatic cell and gene therapies have 
been evaluated in laboratory and animal studies [185-187]. 


4.2 
Bethlem Myopathy 


Collagen VI is an ubiquitous extracellular matrix protein that forms beaded 
filaments in tissues [10]. Inherited mutations in genes encoding collagen VI in 
humans cause two muscle diseases, Bethlem myopathy characterized by mus- 
cular dystrophy and joint contracture, and Ullrich congenital muscular dys- 
trophy [188]. An unexpected collagen-mitochondria connection [189] has 
been demonstrated in collagen VI-deficient (Col6a1-/-) mice, which have a 
muscle phenotype that strongly resembles Bethlem myopathy [190]. The loss 
of contractile strength is associated with ultrastructural alterations of sar- 
coplasmic reticulum and mitochondria and spontaneous apoptosis in muscle 
[190]. The mechanism leading to mitochondrial defects and apoptosis might 
involve integrins. Lack of collagen VI might cause mitochondrial dysfunction 
and increased permeability transition pore opening through an abnormal 
engagement of integrins, in keeping with the fact that integrin-mediated sig- 
naling regulates mitochondrial function. Collagen VI myopathies have thus an 
unexpected mitochondrial pathogenesis that could lead to therapeutic ad- 
vances [190]. 


4.3 
Alport Syndrome 


Collagen IV is a major structural component of basement membranes. In the 
glomerular basement membrane of the kidney, the «3, a4, and a5(IV) collagen 
chains form a distinct network that is absent in most patients affected with 
Alport syndrome, a progressive inherited nephropathy associated with mutation 
in COL4A3, COL4A4, or COL4A5 genes [150]. Gene therapy of Alport syndrome 
aims at the transfer of a corrected type IV collagen alpha chain gene into renal 
glomerular cells responsible for production of the basement membrane. Several 
studies have been performed in experimental models. Adenovirus-mediated 
transfer of a5(IV) chain cDNA into swine kidney in vivo leads to the deposition 
of the protein into the glomerular basement membrane. This indicates that 
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correction of the molecular defect in Alport syndrome is possible using in vivo 
gene transfer into glomerular cells [151, 191]. In a canine model of Alport 
syndrome, transfer of the o5(IV ) chain gene to smooth muscle restores in vivo 
expression of the a6(IV) chain that requires the presence of the a5(IV) chain 
for incorporation into collagen trimers. The adenoviral vector containing the 
o (IV) transgene will serve as a useful tool to further explore gene therapy for 
Alport syndrome [192]. A human-mouse chimera of the a3a4a5(IV) collagen 
protomer restores a functional glomerular basement membrane and rescues 
the renal phenotype in Col4a3-/- Alport mice [193]. 


4.4 
Epidermolysis Bullosa 


Epidermolysis bullosa comprises a family of inherited blistering skin diseases 
with variable clinical phenotypes caused by mutations in the COL7A1 gene 
1178, 194] and in the COL17A1 gene 164, 83]. Mutations in COL17A1 gene are 
responsible for certain forms of junctional epidermolysis bullosa and a rare 
subform of epidermolysis bullosa simplex [83], while mutations in the collagen 
VII gene lead to dystrophic epidermolysis bullosa [178]. 

Collagen VII is the major constituent of anchoring fibrils, which extend from 
the lamina densa of epidermal basement membrane into the underlying der- 
mal connective tissue. Because retroviral vectors cannot accommodate the full 
length collagen VII cDNA, a recombinant truncated type VII collagen minigene 
has been developed and characterized for gene therapy of dystrophic epider- 
molysis bullosa. The minigene product retains the functions and characteris- 
tics of a full length w (VII) chain and its expression in keratinocytes from 
patients with recessive dystrophic epidermolysis bullosa induced reversion of 
the recessive dystrophic epidermolysis bullosa phenotype [195]. The suit- 
ability of retroviral vectors for gene therapy of this disease has been further 
demonstrated in dogs expressing a mutated collagen VII [196]. High transfer 
efficiency, but also high expression levels, are required to ensure therapeutic 
efficacy in the presence of mutated gene products [196]. 

Restoration of type VII collagen expression and function in dystrophic epi- 
dermolysis bullosa skin in vivo has been successfully performed. The COL7A1 
gene was delivered using lentiviral vectors to keratinocytes and fibroblasts 
from patients with recessive dystrophic epidermolysis bullosa (RDEB). The 
gene-corrected cells were then used to regenerate human skin on immune-de- 
ficient mice [197]. Stable nonviral genetic correction of inherited human skin 
disease have been reported by introducing COL7A1 gene with a bacteriophage 
integrase into primary epidermal progenitor cells from patients with RDEB 
[198]. This circumvents difficulties in stably integrating large corrective se- 
quences such as the large COL7A1 gene into the genomes of long-lived prog- 
enitor-cell population. Skin regenerated using these cells displayed stable cor- 
rection of hallmark RDEB disease features, including collagen VII expression, 
anchoring fibril formation and dermal-epidermal cohesion [198]. The proof of 
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principle for genomic DNA vectors (P1-derived artificial chromosome) as a 
mean of restoring collagen VII production in REB keratinocyte cell line has 
been reported [199]. Intradermal injection of recessive dystrophic epider- 
molysis bullosa fibroblasts overexpressing collagen VII into intact RDEB skin 
stably restored collagen VII expression in vivo and corrected the major fea- 
tures of the disease. Injection of genetically engineered fibroblasts provides a 
simplified approach to correct human disorders of secreted matrix proteins 
[200]. 

Most patients suffering from junctional epidermolysis bullosa lack type XVII 
collagen mRNA due to nonsense-mediated mRNA decay. A retroviral expression 
vector for wild-type human collagen XVII has been delivered to primary ker- 
atinocytes lacking collagen XVII from patients with junctional epidermolysis 
bullosa. Restoration of full-length collagen XVII protein expression was asso- 
ciated with adhesion parameter normalization of primary junctional epider- 
molysis bullosa keratinocytes in vitro [201]. Other approaches and the issues to 
resolve in this particular field of gene therapy have been reviewed [83, 202]. 
Future perspectives include the modulation of splicing reactions in ker- 
atinocytes, which could be applied in patients with mutations disrupting the 
normal sequence of a splice site [202]. 


4.5 
Corneal Endothelial Dystrophies 


Collagen VIII is a major component of Descemet's membrane, a specialized 
basement membrane that separates endothelial cells from the stroma in cornea. 
The first description of mutations in collagen VIII in association with human 
disease has been reported only in 2001 [203], although this collagen was first 
described in 1980. Missense mutations in COL8A2, the gene encoding the 
«2(VIII) chain, cause two forms of corneal endothelial dystrophy. 


4.6 
Schmid Metaphyseal Chondrodysplasia 


Collagen X is involved in chondrocyte hypertrophy and endochondral ossifi- 
cation. Mutations in the COL10A1 gene disrupt growth plate function and re- 
sult in Schmid metaphyseal chondrodysplasia with short stature and bowed 
bones. With the exception of two mutations that impair signal peptide cleav- 
age during «1(X) chain biosynthesis, the mutations are clustered within the 
carboxyl-terminal NC1 domain [10, 204]. However, despite the different nature 
of the NCI and signal peptide mutations in collagen X, they all result in im- 
paired collagen X secretion [205]. 
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5 
Collagens and Neurodegenerative Diseases 


Cultured primary neurons expressed collagen XIII, which enhances neurite 
outgrowth [206]. Collagen XXV/CLAC-P is specifically expressed by this cell 
type [7] and endostatin was found in cortical neurons in Alzheimer's disease 
brains [207]. The longest variant of collagen XVIII is also expressed in human 
brain [116]. In addition, several collagen types are found to be associated with 
cerebral deposits of the amyloid B-peptide (Ap), which are neuropathological 
lesions characteristic of Alzheimer disease. Pathophysiological hallmarks of 
the disease are the formation of senile plaque and blood vessels with amyloid 
angiopathy. The potential of targeting through molecular therapeutics amyloid 
beta-protein fibrillogenesis, which causes the initiation and progression of 
Alzheimer’s disease, offers an opportunity to improve the disease. 

AB deposits have been localized to the vascular basement membrane region 
of capillaries and two collagens found in basement membrane, collagens IV 
and XVIII, are associated to them. Collagens IV and XVIII are localized in 
senile plaques in patients with Alzheimer’s disease [208 and references therein, 
209]. Collagen XVIII is also associated with vascular amyloid AB deposits [209] 
and its C-terminal fragment, endostatin, has been found in neuronal and para- 
cellular deposits in brains of patients with Alzheimer’s disease [207]. Endostatin 
has the propensity to form cross-B-structure and to aggregate into amyloid de- 
posits 1210, 2111. Amyloid fibrils formed by endostatin bind and are cytotoxic 
to murine neuroblastoma cells and to endothelial cells, the cytotoxicity being 
restricted to the aggregated amyloid form of endostatin [211]. Furthermore, 
amyloid endostatin induces plasminogen activation by endothelial cells, 
resulting in vitronectin degradation and plasmin-dependent endothelial cell 
detachment [212]. This suggests that plasminogen activation system plays a 
role in endostatin function. Amyloid endostatin may inhibit angiogenesis and 
tumor growth by stimulating the fibrinolytic system [212]. 

The extracellular CLAC domain of membrane collagen XXV (see above), 
generated by furin convertase, is massively deposited within extracellular $ 
amyloid plaque. Both secreted and membrane-tethered forms of collagen type 
XXV/CLAC-P specifically bind to fibrillized AB, implicating these proteins in 
B-amyloidogenesis and neuronal degeneration in Alzheimer’s disease [7]. 
CLAC is associated with amyloid fibrils that form bundles intermixed with 
cellular processes and it has been suggested that its three collagenous domains, 
which are resistant to degradation by a number of proteases, may protect amy- 
loid deposits against degradation [7]. The Alzheimer disease amyloid-associ- 
ated protein called AMY was found to be identical to CLAC [213]. 

Another type II transmembrane collagen-like molecule, the scavenger 
receptor type A, has been shown to bind to the fibrillized form of AB and to 
contribute to the scavenging of AB by microglial cells [214]. In Alzheimer’s 
disease, microglial expression of the scavenger receptor type A is increased 
[215]. The addition of the complement component Clq, which contains a 
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collagenous domain and is also associated with amyloid deposits, to preformed 
Ap aggregates results in significantly increased resistance to aggregate resolu- 
bilization [216]. It would be of interest to know if the Clq-related factor, which 
is expressed at highest levels in the brain [126], can also bind B-amyloid. 

In contrast to Clq which promotes fibrillization of Ap in vitro [216], colla- 
gen IV inhibits amyloid p-protein fibril formation in vitro by preventing 
formation of a B-structured aggregate of AB40 and may affect fibril elongation 
and nucleation [208]. Collagen IV disrupts preformed A42 fibrils in vitro by 
inducing a structural transition from $-sheet to random structures in Ap42, 
which is the initially deposited species in the plaques and can serve as nucle- 
ation site for fibril formation by soluble AB40 peptide [217]. 

The ability of collagen IV, and other basement membrane components such 
as laminin and entactin, to induce disassembly of Ağ fibrils makes them po- 
tential effective agents for therapeutics. This also opens new perspectives for 
the development of novel treatment strategies of Alzheimer's patients to reduce 
the deposition of the senile plaques in the brain by modulating the amyloido- 
genic pathway. 


6 
Conclusion 


During the last decade, a number of new collagens have been identified thanks 
in alarge part to the complete sequencing of the human genome. The collagen 
superfamily should come to completion in a very near future and this may be 
the opportunity to think about its nomenclature. As a matter of fact some of 
collagen-like proteins such as emilins fulfill the criteria required to be classi- 
fied as collagens. Classifying collagen members into separate sub-families is not 
a simple task, and “cross-talks” between two distinct sub-families are frequent 
whatever the criteria chosen for their classification: gene structure, sequence 
homology, structural organization, supramolecular assemblies, and tissue 
distribution. For instance, the basement membrane collagens includes collagen 
IV, XV, XVIII and XIX, but if we consider their sequence and structural orga- 
nization, these collagens fall into three sub-groups: the multiplexins (collagens 
XV and XVIII), the FACIT collagen XIX and collagen IV, which is the single rep- 
resentative of its group. Another point to consider for future studies is that, 
starting from collagen XII, all collagens discovered are supposed to form ho- 
motrimers. Collagen XXVI might be able to form heterotrimers because Emu2, 
which is identical to the &1(XXVI) chain as discussed above, is capable of form- 
ing heterotrimeric complexes with the Emul protein via the EMI domains 
[121]. It remains to be determined if Emul protein can be considered as an- 
other a chain of collagen XXVI. Based on their high structural identities, some 
collagens could form hybrid molecules, notably collagen XXIV and XXVII, as 
already shown for collagen V and XI. This could dramatically increase the 
molecular diversity encountered in tissues. 
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Faced with the remarkable advances in collagen discoveries, the elucidation 
of their specific properties and functions represents a real challenge for the 
next future. Collagens do sustain both biomechanical and regulatory functions 
in almost all tissues. Some of the most striking newest examples have been 
reviewed in this issue, notably the transmembrane collagens with the neuronal 
collagen XXV as a component of Alzeimer amyloid plaques, the identification 
of the so-called matricryptins including the potent anti-angiogenic endostatin 
fragment issued from collagen XVIII and the N-propeptide domain, called 
CRR, of the fibrillar collagen IIA that regulates availability of morphogens 
during cartilage development. 

Most of the recently identified collagens are present in tissues in trace 
amounts and molecular shape, processing and functional studies were ap- 
proached by expressing recombinantly the complete triple helical molecule or 
selected domains. Subsequent to the efficient development of various expres- 
sion systems, from bacteria to plants (reviewed in [218]), a number of key fac- 
tors of collagen structure and assembly was elucidated and exciting aspects of 
unexpected and extremely diverse collagen functions have been revealed. The 
development of collagen recombinant expression also allows structural stud- 
ies. The knowledge of three dimensional crystal structure of non collagenous 
domains and the search for their interacting partners within the extracellular 
matrix will help to get better understanding of their function. Recently, 
the co-crystallization of integrin collagen-binding domain with triple helical 
peptides encompassing the cell binding site revealed the existence of ligand- 
induced conformational changes, which probably underlie either the affinity 
regulation or the signaling capacity of integrins [219]. Such methodological 
development is undoubtedly promising for analyzing cell and/or proteins 
interaction with collagens. Mice carrying spontaneous or experimentally in- 
duced mutations in collagen genes have proven helpful to highlight collagen 
function during development, to understand collagen diseases and to develop 
models for human gene therapy [220]. The generation of mutations on the 
novel collagen genes will likely provide clues on their function. An interesting 
new development for understanding collagen biology was boosted by the 
recent complete genome sequencing of C. elegans. Some collagen members 
appeared to be well conserved during evolution [221], the use of this potent 
developmental organism model will facilitate genetic approaches of collagen 
functions. 
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Abstract Collagen is synthesized in the endoplasmic reticulum (ER) as procollagen, which 
is the precursor protein that bears propeptide domains at either end of the triple helical 
domain. The processes by which procollagen is synthesized in the lumen of the ER include 
unique steps that are not found in the biosynthesis of globular proteins. First, each 
polypeptide chain of procollagen (proa-chains) finds its correct partners, which enables the 
formation of the distinct types of procollagen. Second, triple helix-formation of long Gly- 
X-Y repeats starts at a defined region, which results in the formation of a correctly aligned 
triple helix and thereby prevents mis-staggering. The most characteristic step is the forma- 
tion of the triple helix. This step involves specific post-translational modifications, in 
particular, the prolyl 4-hydroxylation of the Y-position amino acids that stabilizes the triple 
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helical conformation. The formation of the triple helix is a slow process compared to the 
folding of globular proteins, including cis-trans isomerization of the many prolyl and hy- 
droxyprolyl peptide bonds. Recent advances have indicated that these processes are assisted 
by a set of the ER-resident molecular chaperones, such as protein disulfide isomerase (PDI), 
peptidyl prolyl cis-trans isomerases (PPlases), heat-shock protein (Hsp)47, and prolyl 4-hy- 
droxylase (P4-H). The intracellular trafficking of procollagen molecules has also been shown 
to involve a pathway distinct from that utilized by small secretory proteins. 


Keywords Procollagen : Triple helix - Molecular chaperone - Folding - 
Endoplasmic reticulum 
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1 
Overview of Procollagen Biosynthesis 


Collagens occur ubiquitously in multicellular animals as they are the major com- 
ponents of the extracellular matrices (ECMs). The feature that characterizes the 
collagens is that they consist of tandemly repeated Gly-X-Y triplets and have a 
unique triple helical structure. The number of triplet repeats ranges from a few 
dozen to 510 depending on the collagen type. Some types of collagen are 
homotrimeric proteins that consist of three identical a-chains, such as type II 
and III, while others are heterotrimers of different a-chains. For example, type I 
collagen consists of two a1(I) chains and one «2(1) chain. 

To date, 27 types of collagen consisting of specific a-chains that are encoded 
by over 40 different genes have been identified. Most collagen types assemble 
into supramolecular architectures either by themselves or with the aid of other 
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ECM components, including other types of collagen. The collagen family 
proteins are classified into the following subfamilies based on their molecular 
and supramolecular structures: 


1. Fibril-forming collagens (types I, IL III, V and XI) 

2. Network-forming collagens (types IV, VIII and X) 

3. Fibril-associated collagens with interrupted triple helices (FACITs; types IX, 
XIL XIV, XVI and XIX) 

4. Transmembrane collagens (types XIII and XVII) 

5. Other types of collagens 


In addition, a number of collagen-related proteins, such as complement 1q 
(C19), lung surfactant proteins, collagen-like lectins (collectins) [1, 2], the tail 
of the asymmetric form of acetylcholinesterase [3], and macrophage scavenger 
receptors [4, 5], also possess collagenous triple helices composed of Gly-X-Y 
repeats. 

All of the collagen family proteins are secretory proteins that pass through 
the endoplasmic reticulum (ER), where they are folded into triple helical mol- 
ecules. All collagens also contain non-collagenous domains; in the case of the 
fibril-forming collagens; these are recognized as propeptide domains. In this 
chapter, we focus on the pathway by which collagen is synthesized within the 
cell. Unless otherwise indicated, this chapter will center on the biosynthesis of 
the fibril-forming collagens since most of the studies on collagen biosynthesis 
have examined this type of collagen (mainly types I and III). 

The fibril-forming collagens form similar molecular and supramolecular 
architectures as they all generate cross-striated fibrils with an axial D peri- 
odicity that is typically 67 nm. The individual molecules consist of long triple 
helical domains comprised of approximately 330 Gly-X-Y triplets per chain, 
and these form rope-like molecules with a length of 300 nm. The biosynthesis 
of these collagen structures involves a unique pathway that consists of a triple 
helix-forming process coupled with a number of specific post-translational 
modifications. Collagen is first synthesized and secreted as procollagen, a larger 
precursor protein (Fig. 1). These procollagen molecules possess a central triple 
helical domain and non-collagenous propeptide domains at both their N- and 
C-terminal ends. Both propeptide domains are cleaved off in the final step of 
collagen biosynthesis. 

Procollagen biosynthesis was first outlined in the late 1970s [6]. The pathway 
that was described then is basically compatible with the current knowledge in 
this research field (Fig. 2). Since each proa-chain contains the signal sequence 
at its N-terminus, the nascent polypeptide is co-translationally translocated 
into the lumen of the rough ER, like other secretory proteins. Post-translational 
modifications of the triple helix-forming region are initiated during this 
translocation step [7]. The post-translational modifications include prolyl 4-hy- 
droxylation at the Y-positions, prolyl 3-hydroxylations at the X-positions, and 
lysyl hydroxylations to form hydroxylysine (Hyl) residues. Some of the Hyl 
residues are subsequently glycosylated to form galactosyl-Hyl and glucosyl- 
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Fig.1 Structure and shape of procollagen 
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Fig.2 Procollagen biosynthesis 
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galactosyl-Hylresidues. After the polypeptide translocation is completed, three 
proa-chains trimerize at the C-propeptide domains. The association of the 
C-propeptides is stabilized by intermolecular disulfide bridges. It should be 
noted that the association of the C-propeptides must be accomplished with 
correct pairing according to the collagen type. 

Following C-propeptide assembly, the formation of the triple helix starts 
from a C-terminal nucleus and then propagates toward the N-terminus simi- 
lar to fastening a zipper. This triple helix-forming process involves prolyl 4-hy- 
droxylation of up to 100 Pro residues per chain; these increase the thermal 
stability of the triple helix. This is shown, for instance, by the fact that the block- 
ing of prolyl 4-hydroxylation by a, o”-dipyridyl, which chelates the Fe(II) ions 
that are essential for prolyl 4-hydroxylase (P4-H) activity, inhibits intracellular 
triple helix formation [8]. 

The procollagen molecules that have completed their modifications and 
folding are then transported to the Golgi apparatus. In the Golgi cisternae, the 
procollagen molecules are stacked laterally and form aggregates [9, 10]. Finally, 
the procollagen aggregates are secreted into the extracellular space, where the 
N- and C-propeptides are enzymatically cleaved off, thus generating the mature 
collagen molecules. Upon cleavage of the propeptides, the triple helical collagen 
molecules self-assemble into fibrillar supramolecules in which each molecule is 
displaced about one-quarter of its length along the axis of the fibril. 

The events that occur in the ER involve successive actions of ER-resident 
molecular chaperones that play either a general role in chaperoning a variety 
of secretory proteins or a specialized role for procollagen. These molecular 
chaperones also contribute to the quality control mechanism that ensures that 
only correctly-folded procollagen is secreted. 

In the following sections of this chapter, we will describe each step of colla- 
gen biosynthesis in detail, including recent observations. We will also describe 
how the various molecular chaperones participate in procollagen biosynthesis. 


2 
Initial Trimerization of a-Chains 


The folding of procollagen begins with C-propeptide trimerization after the 
complete translocation of the proa-chains into the lumen of the ER. Before 
triple helix-formation, the trimerized C-propeptide domains are stabilized by 
intermolecular disulfide bridges [11-14]. 

Since the arrangement of the a-chains in the trimer varies depending on 
the types of collagen involved, the a-chains must associate in a distinct and 
controlled manner that eliminates the possibility of generating hybrid a-chains 
that are composed of different types of collagens. Exceptions to this are 
proal(XI) and proa2(XI), which can form hybrid heterotrimers with other 
proa-chains that produce type V collagens in certain tissues [15-17]. Most col- 
lagen-producing cells simultaneously synthesize different types of procollagen 
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molecules without producing incorrectly assembled a-chains. For instance, skin 
fibroblasts produce at least three fibril-forming collagens, namely, types I, III and 
V, which are comprised of six homologous a-chains, «1(I),«2(I), x (Tİ, «1(V), 
«2(V) and a3(V). How can these different but homologous chains assemble in 
such a selective manner? The triple helices themselves do not possess an in- 
trinsic ability to find the correct partners. Unlike the base pair-formation that 
results in a DNA duplex, the Gly-X-Y sequences in procollagen do not comple- 
mentarily recognize one another in a residue-to-residue fashion. Rather, it is 
the C-propeptide domains of procollagen that assure the type-specific chain 
assembly of the individual procollagen chains. Observations of fibril-forming 
procollagen molecules that contain naturally occurring mutations or artificial 
deletions have revealed the important roles played by the C-propeptide in the 
trimerization that is the initial step of procollagen folding [18, 19]. 

The C-propeptides of the proa-chains of fibril-forming collagens are 
non-collagenous domains containing approximately 250 amino acid residues. 
These C-propeptides are highly homologous to one other [20] (Fig. 3A-C) as 
they contain eight conserved cysteine residues that are oxidized into four cys- 
tine pairs during the folding/trimerization process. The four cysteine residues 
located in the N-terminus of the C-propeptide contribute to this intermolecu- 
lar crosslinking while the latter four form intramolecular disulfide bridges [21, 
22]. Although the atomic-level structure of a procollagen C-propeptide trimer 
has not been solved, a low resolution model of the trimer of the recombinantly 
expressed C-propeptide of human type III procollagen has been generated by 
laser light scattering, analytical centrifugation, and small angle X-ray scatter- 
ing [23, 24]. This revealed the C-propeptide trimer as a cruciform structure 
with three major lobes and one minor lobe (Fig. 3E). It was speculated that the 
three major lobes are the individually folded C-terminal regions of the 
C-propeptides and the minor lobe consists of the disulfide-bridged N-terminal 
regions of the C-propeptides that are connected to the triple helix region. 

Bulleid and co-workers have demonstrated by swapping the C-propeptides of 
procollagens that the C-propeptide domains are necessary and sufficient to 
direct the assembly of homotrimeric procollagens with correctly aligned triple 
helices [25]. The region within the C-propeptide that is responsible for the type- 
specific proa-chain association has been found in the peptide sequence of 23 
amino acid residues. Within this sequence, 15 discontinuous amino acid residues 
have been shown to be responsible for the chain selectivity [25] (Fig. 3C). 

Another region that probably contributes to the associations of the proa- 
chains was recently found, namely, the a-helical coiled-coil motifs located at the 
N-termini of the C-propeptide domains [26]. The coiled-coil motif contains 
heptad repeats (a-b-c-d-e-f-g) in which positions a and d are occupied by hy- 
drophobic amino acid residues [27]. The a-helical coiled-coil structure had 
been shown to be present in various collectins [2] and other collagen-like pro- 
teins, including macrophage scavenger receptors [4, 5]. MacAlinden et al. have 
recently shown that not only these collagen-like proteins but also most types 
of procollagen contain 2-4 heptad repeat sequences that should fold into three- 
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stranded coiled-coils (Fig. 3B). Indeed, they have actually shown that the pep- 
tide with the heptad repeat sequence of the type II procollagen C-propeptide 
forms a trimer that has an a-helical coiled-coil structure [26]. It is likely that 
the C-propeptide coiled-coil region is sufficient to drive the trimerization of 
procollagen molecules lacking the rest of the C-propeptide because when the 
rest of the C-propeptide is replaced with the transmembrane domain of the 
trimeric influenza virus haemagglutinin, it can still form trimers and subse- 
quently generate the triple helix [28]. However, such a coiled-coil formation is 
only possible when the part of the C-propeptide that contains the molecular 
recognition sequence of 15 amino acid residues is also present [26]. 


3 
Nucleation and Propagation of the Triple Helix 


3.1 
Nucleation of the Triple Helix 


After the three proa-chains associate at the C-propeptide domains, the trimer 
is stabilized by forming interchain disulfide bridges. In the case of type III pro- 
collagen, additional interchain disulfide bridges are formed in the C-telopeptide 
region, which is a short non-collagenous sequence that links the triple helical 
domain with the C-propeptide. The triple-helical regions, which can be as long 
as 1000 amino acid residues, then start to twine around one another, thereby 
propagating the formation of the triple helix toward the N-termini. 

The collagen triple helix is a right-handed supercoil consisting of three 
polyproline II-like left-handed helices in which all peptide bonds are in the 
trans configuration 127, 29, 30]. To form the triple helix, it is essential that every 
third residue is a Gly; no other amino acid can replace these Gly residues. The 
Gly residues are buried in the core of the helix. Unlike the a-helical coiled-coil, 
the collagen triple helix does not bear direct intramolecular hydrogen bonds be- 
tween the main-chain amide and the carbonyl groups; instead, all the NH-CO 
hydrogen bonds are formed only between the different polypeptide chains. This 
means the triple helix formation is probably coupled to the concomitant fold- 
ing of individual proa-chains into the polyproline II-helix. In addition, since the 
collagen triple helix does not possess a hydrophobic core, the formation of the 
triple helix is not expected to employ the hydrophobic interactions that most 
other proteins utilize in their folding. 

To form a correctly aligned triple helical procollagen molecule, the initial 
nucleation of the helix must occur at one distinct site. If triple helix formation 
is simultaneously initiated at multiple nucleation sites, mis-aligned and hence 
only partially triple helical molecules like gelatin would be produced. Thus, 
there must be a mechanism in the procollagen folding pathway that either 
directs the nucleation at a distinct site and/or prevents unfavorable nucleation 
of the triple helix. It has been suggested that the C-terminal end of the triple 
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helix-forming domain nucleates first in order to facilitate C to N zipper-like 
folding. Bulleid and co-workers have elucidated the mechanism of the nucle- 
ation of the triple helix by utilizing the semi-permeabilized cell system that 
enables the analysis of the events occurring in the ER [28, 31]. In the case of 
type III procollagen, the nucleation was found to be directed by the prolyl-4- 
hydroxylation of at least two Pro residues at the C-terminal end of the Gly-X- 
Pro repeats. Such Gly-X-Pro triplet repeats are also found at the C-terminal 
ends of the triple helical domains in other fibril-forming procollagens. Once 
the thermal stability of these sequences is increased by the prolyl 4-hydrox- 
ylation, the C-terminal triplet repeats start to wind around one another. 
This nucleation would be supported by an additional stabilizing effect of the 
adjacent trimerized C-propeptide domain, including the coiled-coil region. 
Simultaneous multi-site nucleation may be also inhibited by the interaction of 
ER-resident molecular chaperones. 


3.2 
Propagation of the Triple Helix 


Once nucleated at the C-terminus of the Gly-X-Y repeats, the triple helices 
propagate toward the N-terminus. This process is often compared to fastening 
a zipper. However, it is still unclear whether this process actually does proceed 
as smoothly as fastening a zipper. The fastening of the triple helix is a relatively 
slow process compared to the folding of globular proteins. In the triple helical 
region comprised of Gly-X-Y repeats, the imino acids Pro and 4-hydroxypro- 
line (Hyp) are most frequently found in the X and Y-positions, respectively. As 
peptide bonds connected with a Pro or a Hyp residue often take a cis configu- 
ration, the cis-peptide bonds of the proa-chains must isomerize into a trans 
configuration prior to or upon triple helix formation. This isomerization has 
been found to be the rate-limiting step in the propagation of the triple helix- 
forming process both in vitro and in vivo [13, 32, 33]. However, this step has 
been shown to be accelerated by the action of the peptidyl prolyl cis-trans 
isomerases (PPlases) that also functions as a molecular chaperone. The rate 
of triple helix-propagation may also be greatly affected by the local amino acid 
sequences [34-36]. It has been suggested that the propagation of the triple he- 
lix may occur in a punctuated fashion that involves local micro-unfolding. 

In general, proteins possess sufficient but still marginal structural stability 
against thermal denaturation. This has long been suggested to be true for the 
thermal stability of collagen as well. For example, the melting temperature (Tm) 
of human type I collagen was determined to be 41-42 °C, which is only slightly 
above the human body temperature [34, 37, 38]. However, Leikina et al. have 
recently overturned this currently accepted paradigm by concluding that the 
triple helical region of type I collagen is actually thermally unstable at body 
temperature [39]. The authors analyzed the thermal unfolding of the triple 
helix by ultra-slow scanning calorimetry and isothermal circular dichroism 
and found that the thermal stability of triple helical human type I collagen is 
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below 36 °C, and that at 37 °C the collagen prefers to exist as random coils. This 
conclusion was supported by the recent observation that the equilibrium Tm 
value of pN type III collagen (collagen with N-propeptide) is also below the 
body temperature [40]. These findings indicate that there is a critical problem 
in the procollagen folding in homoisotherm cells, namely, the Gly-X-Y repeats 
of proa-chains do not possess an intrinsic ability to fold into triple helical 
conformation at body temperature, and the propagating triple helix is always 
confronted with the thermal destruction of the triple helix. 

How, then, can the cell complete the thermodynamically challenged folding 
of procollagen? Bruckner et al. previously reported that the stability of triple 
helix is higher within the cell than in vitro and suggested that there may be a 
system in the ER that stabilizes the triple helix [41]. More recent evidence now 
suggests that ER-resident molecular chaperones, most likely Hsp47, may con- 
tribute to the stabilization of the triple helical folding intermediates of procol- 
lagen [42, 43]. This issue will also be touched on in the next section. 


4 
Molecular Chaperones Involved in Procollagen Biosynthesis 


The folding, assembly and transport of proteins are often assisted by successive 
actions of molecular chaperones within the cell. The ER also possesses a set of 
molecular chaperones that are utilized in the biosynthesis of secretory proteins, 
lumenal ER proteins, and transmembrane proteins. The biosynthesis of pro- 


Table 1 Molecular chaperones involved in collagen biosynthesis 


Chaperone Major binding-region Comment 
Bip/Grp78 C-Propeptide Hsp70 homolog in the ER 
Calnexin C-Propeptide ER-resident lectin with a 
(N-linked sugar) transmembrane domain 
Calreticulin C-Propeptide A calnexin homolog without 
(N-linked sugar) the transmembrane domain 
PDI C-Propeptide Also function as the 
B-subunit of P4-H 
P4-H Gly-X-Y repeats Known as the principal 
(single-chain) modification enzyme 
for procollagen 
PPIase Gly-X-Y repeats Some members of this 
(single-chain) chaperone may be involved in 


the procollagen folding 


Hsp47 Gly-X-Y repeats Collagen-specific heat-shock 
(triple helical) protein found in the ER 
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collagen molecules also requires the assistance of molecular chaperones resid- 
ing in the ER. Supporting this is that procollagen is co-immunoprecipitated 
with most of the ER-resident chaperones [44, 45]. The ER chaperones involved 
in procollagen biosynthesis fall into two classes. One class includes the general 
ER-chaperones such as Bip/Grp78, Grp94, calnexin, calreticulin, protein disul- 
fide isomerase (PDI), and the PPIases. These chaperones function with a 
variety of client proteins as they have broad binding specificities. Apart from 
the PPlases, this class of chaperones mainly binds to the non-collagenous 
(propeptide) domains of procollagen. The other class of chaperones includes 
the procollagen-specific molecular chaperones such as Hsp47 and P4-H, which 
has also been recognized to be a major procollagen-modifying enzyme. These 
specific chaperones interact with the collagenous domain comprised of the 
Gly-X-Y repeats and recent studies have revealed that these ER-resident chap- 
erones are involved in the folding, cellular trafficking and quality control of 
procollagen molecules (Table 1, Fig. 4). 
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Fig.4 Chaperone binding-sites on a procollagen molecule 
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The procollagen molecules have been shown to exist in an aggregated state 
and to form a reticular-like matrix in the ER lumen in vivo. The size of the 
aggregates can exceed 1500 kDa and they contain PDI [46]. This status of 
procollagen is also supported by the observations that the folding of the 
procollagen molecules occurs when the polypeptide chains are still in close 
association with the ER membrane, and that the membrane binding might be 
mediated by the ER chaperones [47]. These findings suggest that the folding of 
procollagen molecules may be accomplished with the assistance of a variety 
of ER-resident chaperones that form large chaperone complexes with the pro- 
collagen. In the following subsections, we will summarize the structures and 
functions of the individual ER-resident molecular chaperones that are involved 
in procollagen biosynthesis. 


4.1 
Bip/Grp78 


Bip/Grp78 is the ER homologue of Hsp70 and bears the ER-retrieval signal 
Lys-Asp-Glu-Leu (KDEL) sequence at its C-terminal end. Bip, like other mem- 
bers of the Hsp70 family, shows broad binding specificity to short stretches of 
hydrophobic sequences [48]. Bip has been shown to interact stably with the mu- 
tated C-propeptide domain in osteogenesis imperfecta patients. In addition, the 
expression level of Bip in these patients was increased [18, 49]. This elevated 
expression of Bip is probably due to the activation of the unfolded protein 
response (UPR) pathway in the ER [50]. Consequently, it has been suggested 
that Bip is involved in the quality control mechanism that ensures only cor- 
rectly-folded procollagen is transported to the Golgi apparatus, since it may 
capture procollagen molecules that have inappropriately-folded C-propeptide 
domains. However, it remains unclear whether Bip participates in the actual 
folding and quality control of procollagen in the ER. 


4.2 
Calnexin and Calreticulin 


The C-propeptides of the a-chains of fibril-forming procollagens have the con- 
sensus sequence Asn-X-Thr/Ser for the addition of N-linked glycans. After the 
nascent proa-chain enters the ER lumen, N-linked oligosaccharide is attached 
to its C-propeptide region. This allows the proa-chain to be recognized by 
calnexin and/or calreticulin. Calnexin and calreticulin are homologous ER-res- 
ident lectins that recognize the monoglucose form of N-linked oligosaccharide 
of newly synthesized glycoproteins as molecular chaperones. The former is a 
membrane-bound protein whose lectin domain is oriented toward the ER 
lumen, and the latter is an ER-lumenal protein [51]. Although these lectin-type 
chaperones have been found to play critical roles in the folding and quality 
control of various glycoproteins, they have not been shown to function signif- 
icantly in procollagen biosynthesis. Indeed, when the binding of the procolla- 
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gen C-propeptides to calnexin/calreticulin was abolished by mutation of the 
N-linked glycosylation sites, the assembly, folding, or secretion of the procol- 
lagen molecules did not appear to be affected [49]. 


4.3 
Protein Disulfide Isomerase (PDI) 


PDI is one of the most abundant proteins residing in the lumen of the ER. 
Mammalian PDI functions as a dimer of the 57 kDa subunit. PDI plays a 
central role in the folding of various secretory and membrane proteins that 
contain disulfide bridges by catalyzing the correct pairing of their disulfide 
bonds [52]. The disulfide exchange reaction is catalyzed by the formation of 
mixed disulfide bonds between the client proteins and PDI in redox-dependent 
manner [53]. PDI also transiently interacts with various polypeptides as a mol- 
ecular chaperone to facilitate correct protein folding, even if they have no cys- 
teine residues [54]. This chaperone activity of PDI has recently been reported to 
be independent from its redox state [55]. In addition, PDI exists as a component 
of multi-subunit proteins in the ER as it serves as the B-subunit of P4-H and as 
a small subunit in the microsomal triglyceride transfer protein. 

The PDI polypeptide has a unique organization that consists of five struc- 
tural domains, namely, a-b-b’-a’-c. The N-terminal four domains (a-b-b’-a’) 
have similar structural topology to that of thioredoxin [56, 57]. Of these, the 
highly homologous a and a' domains contain a thioredoxin motif consisting 
of Cys-X-X-Cys that is responsible for catalyzing the thiol-disulfide exchange 
reaction. The non-catalytic b' domain contains the principal client-binding site 
[58, 59]. The typical ER-retrieval signal Lys-Asp-Glu-Leu (KDEL) is found at the 
C-terminus of the c domain. 

During procollagen biosynthesis, PDI interacts with procollagen either as 
the molecular chaperone or as a subunit of P4-H. PDI appears to associate with 
procollagen at its C-propeptide region [60, 61]. Since the trimerized forms of 
the C-propeptide domains of procollagens contain inter- and intramolecular 
disulfide bridges, the C-propeptide domains are likely to be substrates in 
PDI-catalyzed folding. In fact, it has been demonstrated that the binding of 
PDI to the C-propeptide plays a crucial role in coordinating the trimeric 
assembly of the proa-chains [60]. The individually expressed procollagen 
C-propeptide domains also remain associated with PDI and are retained in 
the ER until they are folded into native trimeric structures [61]. The stable in- 
teraction of PDI with C-propeptides that are unable to form trimers may also 
be involved in the quality control of procollagen, as has been described for 
Bip/Grp78. 

It has also been found by utilizing the semi-permeabilized cell system that 
there is an additional interaction of PDI with fully folded type X collagen [62]. 
This interaction was not thiol-dependent and it was abolished when pH was 
lowered to pH 6.0, which is similar to the interaction between Hsp47 and 
collagen [63]. Unlike fibril-forming procollagen, the non-collagenous domains 
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of type X collagen are not enzymatically cleaved [64]. It may be that PDI helps 
to prevent the aggregation of type X collagen through its binding. 


4.4 
Prolyl 4-hydroxylase (P4-H) 


P4-H is an essential enzyme in the post-translational modification of procol- 
lagen. The mammalian P4-Hs are heterotetramers that consist of two o sub- 
units and two B subunits. The a subunit has an active site for the catalysis of 
prolyl 4-hydroxylation and three different subtypes of the o subunit have been 
identified [65-67]. The B subunit dimer of P4-H is identical to PDI. The $ sub- 
units are essential in the formation of an enzymatically active «262 tetramer, 
although they do not participate directly in the catalytic activity of prolyl 
hydroxylation. The catalytic function of the B subunit as PDI is not involved in 
the tetramerization of P4-H because the f subunits whose catalytic sites for 
PDI activity have been mutated can still form a «22 tetramer with full P4-H 
activity [68]. Thus, the P4-H B subunits may instead play a role in preventing 
the highly insoluble o subunit from forming inactive aggregates [69]. P4-H is 
localized in the lumen of the ER by virtue of the ER-retention signal in the B 
subunits. The expression level of the o subunit is lower than that of the B sub- 
unit, which allows the B subunits to act as PDI, independent of their function 
as P4-H. 

The central role of P4-H in procollagen biosynthesis is its hydroxylation of 
the Pro residues at the Y positions of the Gly-X-Y repeats. This modification in- 
creases the thermal stability of the triple helix. Accumulating evidences suggest 
that P4-H also acts as a procollagen-specific molecular chaperone through its 
binding to Gly-X-Y repeats in single-stranded proa-chains. Recently, the sub- 
strate-binding domain in the P4-H a subunits was identified by partial enzy- 
matic digestion of the tetrameric enzyme [70]. The substrate-binding domain 
was distinct from the catalytic domain. The single-chain collagenous sequences 
probably bind to the P4-H tetramer by interacting with this substrate-binding 
domain. Recognition of the proa-chains by this domain would facilitate the 
effective prolyl 4-hydroxylation of the long collagenous sequences. Although the 
hydroxylated model substrate (Gly-Pro-Hyp); is a less effective binder to the 
domain than the non-hydroxylated peptide (Pro-Pro-Gly); [71], the association 
of P4-H with single chain proa-chains has been observed in the semi-perme- 
abilized cell system even after prolyl 4-hydroxylation has occurred [60]. As 
described above, the post-translational modifications of procollagen, including 
prolyl 4-hydroxylation, start before the translocation of the entire polypeptide 
is completed, and in the folding pathway, the triple helix-forming region of 
procollagen stays in an unfolded state until the proa-chains trimerize at the 
C-propeptide regions. The interactions of P4-H and even other modification 
enzymes with the triple helix-forming region of the single chain procollagens 
has been suggested to keep the chains in a folding-competent state, thus avoid- 
ing unfavorable nucleation and subsequent mis-aligned triple helix formation. 
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Fig.5 Possible role of P4-H as a procollagen-specific molecular chaperone 


In addition, the binding of P4-H to procollagen may contribute to the quality 
control of the procollagen molecules before they are transported to the Golgi 
apparatus since P4-H appears to capture the unfolded or incompletely folded 
procollagens that still have a single chain region, thus forcing their retention in 
the ER lumen until correct folding is accomplished [72]. Abnormal procollagen 
bearing genetic mutations or the deletion of the collagenous domain have also 
been shown to interact stably with P4-H and to be retained in the ER [73] 
(Fig. 5). 


4.5 
Peptidyl-Prolyl cis-trans Isomerase (PPlase) 


The triple helix-forming region of procollagen contains large amounts of imino 
acid residues. In the Gly-X-Y repeats, about one-third of the X and Y positions 
are occupied by Pro and Hyp, respectively. Although peptide bonds in the cis 
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configuration are energetically unstable in most cases, the peptide bonds fol- 
lowed by a Pro or Hyp residue contain a considerable number of cis-conform- 
ers. It has been estimated that 1596 of the Z-Pro and 896 of the Z-Hyp peptide 
bonds (Z denotes any amino acid residues) in nascent proa-chains of type I 
collagen are in the cis-configuration [74]. Prior to triple helix formation, these 
cis-peptide bonds in the proa-chains must be converted to trans-configurations. 
This cis-trans isomerization step is known to be the rate-limiting step in the 
propagation of the triple helix both in vitro [13, 33] and in vivo [75]. 

PPlases catalyze the cis-trans interconversion of prolyl peptide bonds in 
various proteins and are reported to have a chaperone-like activity [76, 77]. 
Members of the PPlase family are ubiquitously distributed in various organelles 
of various organisms. Proteins belonging to this family are classified into three 
subfamilies based on the different efficacies of small molecule inhibitors. Cy- 
clophilin is a PPIase whose catalytic action is inhibited by the immunosupres- 
sant cyclosporin A. The activity of the members of the FKBP subfamily is 
inhibited by FK506, another immunosuppressant. The Parvulin family proteins 
are also included in the PPIase family. The ER of mammalian cells contains at 
least four different PPIases, namely, cyclophilin B, cyclophilin C, FKBP13 and 
FKBP65. 

The importance of the PPIases in procollagen biosynthesis has been demon- 
strated by showing that inhibition of PPIase activity by cyclosporin A and FK506 
delays the triple helix-formation of procollagen in cellulo [34, 78]. Although it 
is still unclear which ER-resident PPIase plays a central role in the triple helix 
formation of procollagen, at least two, cyclophilin B and FKBP65, have been 
shown to be major PPlases that directly interact with procollagen 179, 80]. 


4.6 
Hsp47 


Of the ER-resident molecular chaperones that are involved in procollagen 
biosynthesis, Hsp47 is one of the most important. Hsp47 was first identified as 
a collagen-binding heat-shock protein (HSP) with a molecular size of 47 kDa 
[81]. It is identical to the proteins that have been reported as colligin [82] and 
gp46 [83]. Hsp47 is a member of the serine protease inhibitor (SERPIN) super- 
family but it does not show protease inhibitory activity. Hsp47 contains the ER 
retrieval signal Arg-Asp-Glu-Leu (RDEL) sequence at its C-termini and thus 
resides in the ER lumen [63, 84]. By using in vivo chemical crosslinking and 
immunoprecipitation techniques, Hsp47 has been revealed to associate with 
procollagen in the ER [44]. The interaction is transient and the procollagen mol- 
ecules dissociate from Hsp47 upon or just before reaching the cis-Golgi [84]. 
Hsp47 was assumed to be a molecular chaperone that is specific for procol- 
lagen biosynthesis because of its transient association with procollagen in the 
ER in addition to the fact that it is a heat-inducible protein [85]. However, 
the real function of Hsp47 in procollagen biosynthesis has not been elucidated 
until recently. In the remainder of this section, we will summarize these recent 


Collagen Biosynthesis 101 


advances in our understanding of the role Hsp47 plays in procollagen biosyn- 
thesis. 

In 2000, Nagai et al. succeeded in disrupting the hsp47 gene in mice and 
demonstrated the indispensable role of Hsp47 in procollagen biosynthesis for 
the first time [86]. Although heterozygotic mice (hsp47+/-) showed no evident 
phenotype, homozygotic mice (/1sp47-/-) showed an embryonic lethal pheno- 
type; these mice died before 11.5 days post coitus (dpc) with severe impairment 
of collagen-based tissue structures, including reticular fibers in the mesenchyme 
and basement membranes (Fig. 6). Fibroblasts established from the hsp47-/- 
mice produced and secreted abnormal (probably incorrectly aligned) type I 
procollagen molecules into the medium. Embryonic stem (ES) cells where both 
alleles of the hsp47 gene were disrupted also secreted insufficiently folded 
typeIV collagens into the medium and did not produce basement membrane- 
like structures in the embryoid bodies [87]. It should be noted that the pheno- 
type of the hsp47-/- mice is more severe than that of any other reported knock- 
out mouse in which individual collagen proa-chains have been disrupted [88, 
89]. These findings clearly suggest that the function of Hsp47 is not limited to 
a distinct type of collagen; rather, it may be that Hsp47 acts as a chaperone for 
various types of procollagens. This suggestion is supported by the observation 
that Hsp47 can interact with various types of collagens in vitro [90]. 

What is the function of Hsp47, and which step in collagen biosynthesis is 
facilitated by the action of Hsp47? Some clues to answer those questions were 
obtained from recent studies on the mechanism by which Hsp47 recognizes 
procollagen. It was not clear until recently which procollagen conformation is 
actually recognized by Hsp47. Earlier in vitro and in vivo studies indicated that 


Fig.6 Hsp47 null mice. A 9.5 dpc. B 10.5 dpc. a, b Neuroepithelium and underlying mes- 
enchyme (9.5 dpc); c, d structures of basement membranes (9.5 dpc) 
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Hsp47 interacts with triple helical collagens [84, 90], while immunoprecipita- 
tion analyses indicated that the polysome-bound nascent proa-chain already 
binds to Hsp47 [45, 84]. Thus, Hsp47 appeared to interact with both single 
collagen chains and triple helical procollagen molecules. The conformation of 
procollagen that is recognized by Hsp47 has been recently appraised by using 
a more sophisticated strategy. Bulleid's group and our own have independently 
demonstrated by different strategies that Hsp47 preferentially recognizes the 
correctly folded triple helical conformation. The former group took advantage 
of the semi-permeabilized cell system and engineered a mini-procollagen with 
a truncated triple helical region. They then showed that only triple helical mol- 
ecules were co-immunoprecipitated with Hsp47 [42]. In our studies, we selected 
the Hsp47-binding peptides in a random collagen-like peptide library that 
encodes (Gly-Pro-Y),-repeats by using a yeast two-hybrid system with Hsp47 
as a bait. Of the peptides that were selected by Hsp47, those whose Y-positions 
contained amino acids that stabilized the helix were positively selected [43]. 
In contrast, the peptides whose Y-positions were occupied by amino acids that 
destabilized the helix were not selected. Thus, the correctly folded triple heli- 
cal conformation of procollagen is preferentially recognized by Hsp47. This 
makes Hsp47 unique as, in general, molecular chaperones recognize polypep- 
tides with an as-yet non-native tertiary structure and are released from them 
after they have folded into their native (or correctly folded) structures. This 
general phenomenon is also the case for the other ER-chaperones that assist the 
proper folding of procollagen. As described in this section, the capturing and 
retention of unfolded and misfolded procollagen molecules by Bip, PDI, and 
P4-H is the key mechanism of the quality control system that ensures the gen- 
eration of properly folded collagens. Thus, the conformational preference of 
Hsp47 constitutes a very unique client-recognition mechanism. 

The sequence dependency of the interaction of Hsp47 with procollagen 
triple helices has also been elucidated recently. Koide et al. found that triple 
helical collagen model peptides that contain an Arg residue in the Y-position 
are specifically recognized by Hsp47 in vitro [91]. The interaction of Hsp47 
with Arg-containing collagenous peptides is much stronger than its recognition 
of the previously identified Hsp47-binding collagen mimetic (Pro-Pro-Gly), 
[92]. The importance of Arg residues in types I and III collagen was also 
demonstrated by using chemically modified collagen. This revealed that Hsp47- 
binding was abolished only when the Arg residues on native collagen were 
specifically modified through 2,3-butanedione treatment as chemical modifi- 
cations of other residues, including Lys, Glu, Asp and His, only slightly affected 
the Hsp47-collagen interaction [91]. This sequence specific interaction between 
procollagen and Hsp47 has also been demonstrated by using genetically 
engineered model procollagen in semi-permeabilized cells [93]. Based on the 
simple assumption that Hsp47 recognizes X-Arg-Gly sequences and that any 
adjacent sequences do not affect the binding, it has been estimated that 
type III collagen, for instance, bears 41 possible Hsp47 binding sites. However, 
more recent work on the in vitro interaction between Hsp47 and collagen has 


Collagen Biosynthesis 103 


suggested that there are additional structural requirements apart from the 
critical Arg residues, since only limited CNBr fragments of collagen could in- 
teract with Hsp47 [94]. At this stage, we cannot precisely estimate the stoi- 
chiometry of the Hsp47-procollagen complex but we believe that a procollagen 
molecule most probably possesses multiple Hsp47 binding sites. 

Although more has been learned in the last few years about the mechanism 
by which Hsp47 recognizes procollagen, its function still remains enigmatic as 
well as how this molecule associates with and dissociates from procollagen. The 
following possible functions of Hsp47 have been proposed to date [95-97]: 


1. Inhibition of intracellular procollagen degradation [98, 99]. 

2. Quality control of procollagen [92]. 

3. Control of procollagen transport to Golgi apparatus [100]. 

4. Stabilization of triple helical folding intermediates of procollagen [42, 43]. 
5. Inhibition of procollagen aggregate formation in the ER [101]. 


Conceptually, recent findings argue for the latter two functions, as will be 
discussed below. 

As described above, the triple helical region of a collagen molecule prefers 
random coil structures at the equilibrium condition at 37 °C [39]. This finding 
strongly suggests the existence of a factor that stabilizes the triple helical 
portions in the folding intermediates of procollagen in the cells. Since Hsp47 
preferentially binds to correctly folded triple helical portions of procollagen, 
this chaperone may be the best candidate as the triple helix stabilizer. This 
scenario is consistent with the fact that Hsp47 is induced by heat shock. All 
the ER-resident molecular chaperones have been reported to be induced by 
ER stress through the UPR pathway but only Hsp47 is induced by heat shock. 
Under heat-stress conditions, procollagen triple helix-formation should be 
more difficult, and the collagen triple helix would be destabilized. In such 
heat-stress conditions, the cells would require higher levels of Hsp47 to sta- 
bilize the procollagen triple helix. However, the likelihood of this mechanism 
is still being debated. Báchinger and co-workers have reported that the ther- 
mal stability and refolding rates of the types I and III collagen in vitro do not 
increase in the presence of Hsp47 [40]. Thus, this putative role Hsp47 may play 
in procollagen biosynthesis has to be corroborated by additional molecular 
studies. 

Yet another probable function of Hsp47 is that it inhibits the aggregation of 
procollagen in the ER. Accordingly, it has been shown that the addition of Hsp47 
prevents the self-association of the triple helix form of type I collagen in vitro 
[101]. Although it is not clear if this mechanism also works in the ER, there is 
supportive evidence that procollagen molecules tend to laterally associate in 
the Golgi apparatus, which is where the procollagens dissociate from Hsp47, 
and that this results in large molecular aggregates [10] (Fig. 7). 

Of the ER-resident chaperones involved in procollagen biosynthesis, only 
Hsp47 binds to correctly folded procollagen. Moreover, even after correct fold- 
ing has been completed, Hsp47 still remains bound to the procollagen in the 
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Q) stabilization of triple helical 
intermediates 


Fig.7 Possible role of Hsp47 


ER. How does Hsp47 dissociate from procollagen? It has been shown previously 
that Hsp47 dissociates from procollagen after leaving the ER, probably in the 
ER-Golgi intermediate compartment (ERGIC) or in the cis-Golgi [84]. A pre- 
vious in vitro study also showed that the Hsp47-collagen interaction is sensi- 
tive to a change in the pH as Hsp47 dissociates from collagen in vitro when 
the pH is below 6.3 [63]. The lumenal pH of the pre-Golgi structures gradually 
decreases in parallel with their translocation to the Golgi region [102], and the 
pH in the trans-Golgi area has been reported to be even lower than pH 6. Taken 
together, these observations suggest that the dissociation of Hsp47 from pro- 
collagen may be caused by a change in the pH during its trafficking from the 
ER toward the Golgi apparatus. Alternatively, the dissociation may occur by the 
simple dilution of the procollagen-Hsp47 complex upon reaching the Golgi 
cisternae which harbor only a few free Hsp47 molecules [90]. 

In vertebrate tissues, the constitutive expression of Hsp47 shows a good 
positive correlation with that of the major types of collagens [85, 95, 103]. Cells 
producing higher levels of collagens also produce higher levels of Hsp47 and 
vice versa. In particular, the expression of Hsp47 is dramatically up-regulated 
under some pathophysiological conditions associated with collagen overpro- 
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duction or with abnormal accumulation of collagen. This correlation between 
collagen and Hsp47 expression has been suggested to be controlled at the tran- 
scriptional level, and cis-acting elements that are responsible for the tissue-spe- 
cific expression of Hsp47 have been identified [104, 105]. These observations 
also support the notion that Hsp47 is an essential molecular chaperone for 
the proper biosynthesis of procollagen. However, this correlation between the 
expression of collagen and Hsp47 is not observed in invertebrates as Hsp47 has 
been found only in vertebrates higher than zebra fish. Genomic screens for 
Hsp47 protein or gene homologues in Drosophila melanogaster and Caeno- 
habititis elegans have not been successful so far (Nagata et al., unpublished 
result), in spite of the fact that collagen is found in all multicellular animals. 
This suggests that Hsp47 may be a crucial component only in the vertebrate 
system of procollagen biosynthesis and that invertebrates follow a different 
scenario of procollagen production. 


5 
Intracellular Trafficking of Procollagen Molecules 


5.1 
ER-to-Golgi Transport 


Secretory proteins folded and modified in the ER are generally secreted to the 
extracellular space via the Golgi apparatus. At the Golgi apparatus, proteins are 
further modified for the sorting to their final destinations. Once triple helix for- 
mation has been completed, procollagen molecules are also transported to the 
Golgi cisternae. Pre-Golgi trafficking of secretory proteins has been studied 
mainly using ts-045-G, a temperature-sensitive glycoprotein of the vesicular 
stomatitis virus. The vesicular coat complexes COPII and COPI play important 
roles in the protein trafficking between the ER and the Golgi apparatus [106- 
108]. Cargo proteins to be transported are first concentrated into buds coated 
by COPII, followed by the formation of nascent transport vesicles. These are 
subsequently coated with the COPI complex to form larger COPI-coated vesi- 
cles [109, 110]. The complexes then segregate into cargo-rich and COPI-rich 
domains. Only the latter returns to the ER, while the former is transported to 
the Golgi together with the cargo proteins. 

The size of the classical transport vesicle that is generated by budding of 
ER-membrane is 60-80 nm in diameter, which may be too small to carry a pro- 
collagen molecule, which consists of a rigid 300 nm-long triple helical rod-like 
structure. Although the ER-to-Golgi transport of procollagen molecules has not 
been extensively investigated, Stephens and Pepperkok has recently shown that 
a distinct pathway is used to transport procollagen [111]. They analyzed the 
transport of procollagen in living cells by utilizing a genetically engineered pro- 
collagen in which green fluorescent protein (GFP) is fused to the C-terminal 
propeptide of proa1(I) chain. The transport complexes that contained procol- 
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lagen molecules were distinctively different from those containing other cargos, 
such as the ts-045-G protein, although procollagen exits the ER via COPII-coated 
exit sites. The procollagen transport complexes did not contain the ERGIC- 
53/p58 protein that all ER-to-Golgi transport complexes identified to date seem 
to carry. It has also been suggested that active processes are involved that 
concentrate the procollagen molecules into the transport vesicles upon ER bud- 
ding. Electron microscopic observations have revealed the morphology of the 
procollagen-containing ER-to-Golgi transport complexes [10, 112], as tubular 
sack-like structures with a length exceeding 300 nm. 


5.2 
Intra- and Post-Golgi Transport 


Once procollagen molecules reach the cisternae of the Golgi apparatus, the pro- 
collagen molecules self-associate to form aggregates approximately 320 nm in 
length and 170 nm in thickness [9, 10, 112]. These aggregates may result from 
the lateral packing of the rod-like procollagen molecules. Although how this 
aggregate formation occurs is not clear, the aggregate structure is reminiscent 
of the segment-long-spacing (SLS) aggregates of collagen with zero-D-arrayed 
molecular packaging that can be formed in vitro by the addition of ATP. In fact, 
it has been found that the procollagen and partially processed procollagen 
secreted by chick embryo tendon cells are similar in structure to the SLS 
aggregates [113]. It should be noted again that the cis-Golgi is an organelle 
where procollagen molecules are liberated from Hsp47 for the first time. The 
dissociation of Hsp47 may result in the aggregation of procollagen in the 
cis-Golgi. In general, the formation of large aggregates in cells is considered to 
be the consequence of dead-end folding and harmful to the cells. While the func- 
tional advantage of forming procollagen aggregates has not been determined as 
yet, one can speculate that it may be a way to protect procollagen molecules from 
attack from proteases in the Golgi apparatus such as matrix metalloproteases. 
It is also possible that procollagen, the individual triple helical domain of which 
does not possess sufficient thermal stability at body temperature [39], acquires 
thermal stability by forming the lateral aggregates. 

The procollagen aggregates that form in the Golgi cisternae then move 
across the Golgi stacks without leaving the lumen of the Golgi cisternae [10]. 
It is still unclear whether this “cisternal maturation model" is applicable to the 
anterograde transport of small soluble proteins that is mediated by the COPII- 
vesicular transport system [114]. However, several observations, including elec- 
tronmicroscopic analyses, suggest this model also applies to the anterograde 
transport between Golgi cisternae of general soluble cargos. After passing 
through the Golgi apparatus, the procollagen molecules are incorporated in 
secretory vesicles where the aggregates increase in length [115]. 

The procollagen molecules that have completed their folding and modifi- 
cations are secreted into the extracellular space, probably in an aggregated 
form. During or following their secretion, the N- and C-propeptide domains are 
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cleaved from procollagens by specific processing enzymes, the N- and C-pro- 
teinases, and are incorporated into collagen fibrils (see Fig. 2). 


6 
Production of Recombinant Collagens 


There is growing interest in collagen as a promising biomaterial in cosmetic 
surgery, tissue engineering and drug delivery systems [116, 117]. Most of the 
collagen currently used for these purposes is purified from the tissues of cows. 
The use of animal collagens is associated with the unavoidable risks of induc- 
ing gelatin allergies [118] or infection with prions that may cause bovine 
spongiform encephalopathy (BSE) [119]. Consequently, the production of 
human collagen, which is not associated with these risks, has been attempted 
in several laboratories. The information retrieved from these attempts has not 
only been useful for the future applications of recombinant human collagens 
but has also provided novel insights into the mechanism of procollagen biosyn- 
thesis. 

To date, a number of different eukaryotic cells have been tested for the pro- 
duction of recombinant collagens, namely, budding [120, 121] and fission yeasts 
[122, 123], mammalian cell lines [124-126], and insect cells [127-130]. Trans- 
genic animals and plants have also been candidate hosts for the production 
of recombinant human procollagen. It has been reported that recombinant 
procollagen can be purified from the milk of transgenic mice [131, 132], from 
cocoons of transgenic silkworms [133], and tobacco plants [134-136]. All of 
these hosts have been shown to produce triple helical collagen or procollagen 
molecules but their thermal stability may differ depending on the systems used. 
With the exception of the mammalian cell system, it was necessary to co- 
express P4-H subunits to produce properly hydroxylated and hence stable 
triple helical molecules. Moreover, although the production of collagen using 
the fission yeast Pichia pastris yielded as much as 1.1 g/l of collagen, the pro- 
collagen molecules were not secreted into the media [122]. This retention of 
procollagen within the cells has been also observed in other cell systems 
except for the mammalian cell lines. In addition, the systems using the mam- 
mary glands of transgenic mice and the silk glands of transgenic silkworms 
employ genetically engineered procollagen that has shortened triple helical 
domains. It has not been shown if these systems are capable of producing 
full-length procollagen. These observations emphasize the existence of a pro- 
collagen-specific intracellular trafficking system in innate procollagen-produc- 
ing cells. In addition, as yeasts, insects, and plants are not likely to possess Hsp47, 
their inability in secreting procollagen molecules may relate to the lack of 
Hsp47. Consequently, it is tempting to speculate that Hsp47 may be involved in 
the procollagen-specific trafficking in innate procollagen-producing cells [137]. 

The Pichia pastoris expression system has also led to the discovery of an 
unusual control mechanism between P4-H and its procollagen substrate, 
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namely, that the formation of the stable &202 tetramer of P4-H requires the 
expression of procollagen and that the folding of procollagen requires the 
expression of the active tetrameric enzyme [122]. In addition, even more strik- 
ing observations came from Saccharomyces cerevisiae, where human pro 
a-chains lacking the N- and C-propeptides are folded into correctly aligned 
triple helical molecules [121]. This unexpected observation challenges the con- 
sensus understanding that the C-propeptide domain is necessary for the initial 
association of the proa-chains (see above). 


7 
Conclusions and Future Perspectives 


As described in this chapter, procollagen biosynthesis in cells constitutes a very 
complex system involving a number of unique processes that are generally not 
found in the synthesis and secretion of globular proteins. Recent progress has 
provided remarkable support to the widely accepted model of procollagen 
biosynthesis that is illustrated in Fig. 2. It should be noted that this progress has 
been made possible by applying novel techniques to investigate the events that 
occur during procollagen biosynthesis. One of these techniques is that of 
Bulleid and co-workers, who developed the system that utilizes the in vitro 
translation/translocation in semi-permeabilized cells [138]. This elegant sys- 
tem, along with the utilization of genetically engineered proa-chains, has made 
it possible to investigate the events that occur in the ER. This system has been 
used to extend our understanding of the functions of ER-resident molecular 
chaperones in procollagen folding. The development of a system for single- 
molecule imaging in living cells has also contributed to the study of the 
dynamic trafficking of procollagen between organelles [108]. This technology 
will enable us to obtain further insight into the intracellular trafficking of 
procollagen, including how procollagen quality is controlled and how the chap- 
erones return to the ER. 

The most striking finding in recent years is that the melting temperature of 
mammalian mature collagen, which is an indicator of the thermal stability of 
collagen molecules, is apparently below body temperature [39]. This finding 
has caused a paradigm shift in our understanding of the folding and stability 
of procollagen since the procollagen molecule cannot be expected to fold into 
its destined structure at the body temperature. Indeed, even after the structure 
is formed, the thermal unfolding may occur at least locally. This may be the rea- 
son why the procollagen molecule requires the Hsp47 chaperone, as it may 
specifically stabilize the triple helical assembly of procollagen. It may be also 
the reason why Hsp47 is the only heat-inducible stress protein in the ER of 
mammalian cells. Once procollagen molecules reach the cis-Golgi, they self-as- 
sociate laterally and never dissociate thereafter [10]. 

Of all proteins that are integrally involved in the biosynthesis of procollagen, 
the function of Hsp47 seems to be the least clear. Indeed, the use of mammalian 
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and other model systems, including yeasts and invertebrates, to produce recom- 
binant collagen have led to contradictory observations regarding the function of 
Hsp47. In mammals, Hsp47 is essential for proper procollagen biosynthesis while 
in some invertebrates, procollagen biosynthesis can be accomplished without 
Hsp47. It is thus possible to speculate that Hsp47 is a specialized chaperone that 
plays a role only in collagen-producing mammalian cells. 

In this chapter, we focused on the procollagen synthesis in mammalian cells. 
However, multicellular animals ubiquitously synthesize collagen, and animals 
other than arthropods and nematodes have been shown (or predicted) to pos- 
sess fibril-forming collagens [139]. Interestingly, Hsp47 has not been reported 
to be expressed in those animals. Do invertebrates possess similar machineries 
of procollagen synthesis or are their machineries completely different? We do not 
have enough information to answer the question, but we speculate that the sys- 
tem may vary from organism to organism. For instance, the collagen of the vent 
worm Rifia pacyptila that inhabits deep sea volcanic vents has been suggested to 
be stabilized by the addition of the di- and tri-saccharides of galactose to thre- 
onine residues in the Y-positions [140-142]. This type of collagen modification 
has been also suggested to occur in the earthworm Lumbricus terrestris and the 
clamworm Nereis virens [143, 144]. Thus, annelid collagens are likely to employ 
a different mechanism to stabilize the triple helices. Searching for the prototype 
of the mammalian system of procollagen production in lower organisms will 
provide further information for the mechanisms by which procollagen is syn- 
thesized and how these mechanisms have evolved. Moreover, it should be noted 
that bacteria, bacteriophages, and viruses possess genes encoding collagen-like 
molecules [145, 146]. Recent analyses suggested that triple helical collagen-like 
molecules containing tandem Gly-X-Y repeats can be synthesized in bacteria (Xu 
et al. 2002) which do not possess an ER suited to fold and modify eukaryotic 
procollagens. Analysis of the collagen biosynthetic pathways in such organisms 
would provide novel insights into the mammalian counterpart. 

In the triple helix-formation of heterotrimeric procollagens, there is another 
question that has not yet been resolved. Within the triple helix, polypeptide 
chains are placed with one-residue-staggering along the helical axis. This would 
make it possible to form structural isomers of procollagen when it consists of 
different proa-chains. For instance, in the case of type I procollagen, the theo- 
retical registers of the three chains along the helical axis could be alala2, 
oloiol, and oiolol These hypothetical isomers of procollagen are expected 
to resemble each other in terms of their overall molecular structure and ther- 
mal stability. However, when looking more closely at these molecules, the 
surface profiles of their triple helices should differ. Does the cell selectively 
synthesize only one of the isomers? A clue to selective production has been 
obtained by using type IV collagen [147]. The residues (one Arg and two Asp 
residues) on type IV collagen, which are critical for x1B1 integrin-binding, are 
located on different chains. By using the fluorescent energy transfer (FRET) 
technique followed by residue-specific fluorescent labeling, this region has been 
determined to have the register of a2(IV) al(IV) a1(IV). A subsequent study 
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was then conducted with a heterotrimeric collagen-model peptide with the cor- 
responding integrin-binding sequence of type IV collagen [148]. As expected, 
the triple helical model with the native chain-register showed a higher affinity 
for x11 integrin, but the thermal stability of the natively-aligned triple helix 
was lower than the non-natively staggered one. This observation implies that 
the conformational stability of the natively staggered triple helix may not 
always, at least locally, be higher than that of mis-staggered ones. 

As discussed in Section 6, the production of recombinant human collagens 
has been actively investigated in the hope of generating commercially available 
alternatives of animal collagens. However, most of the systems tested have 
failed, probably because of missing biosynthetic machineries in the respective 
hosts. The elucidation of the entire biosynthetic pathway that leads to mam- 
malian collagen will greatly aid the production of recombinant collagens. 

Control of collagen synthesis by effective drugs is a promising way to treat 
various fibrotic diseases, such as those of the liver, lung, kidney, and arte- 
riosclerosis. All of the proteins that are specifically involved in procollagen 
biosynthesis, including the modifying enzymes and the chaperones, could be 
potential drug targets. To date, many compounds that inhibit P4-H activity have 
been identified and synthesized. Some of these, such as HOE 077 and Safironil, 
have been reported to effectively prevent the progression of fibrosis in vivo 
[149]. There are at least six P4-H enzymes in humans. Three are procollagen 
P4-Hs that show different tissue distributions and three are hypoxia-induced 
P4-Hs that control oxygen homeostasis [150]. As their mechanisms of enzyme 
action are very similar, the development of inhibitors that are specific for the 
P4-H isoenzymes is expected to be very useful in the treatment of fibrotic 
diseases. Inhibition of Hsp47 function may also be a promising objective of 
antifibrotic drugs, since expression of Hsp47 is elevated in various fibrotic 
diseases [151, 152]. Although small molecules that inhibit Hsp47 function have 
not yet been discovered, it has been shown that the administration of antisense 
oligonucleotides against Hsp47 suppresses the accumulation of collagen in 
experimental glomerulonephritis [153] and peritoneal fibrosis [154]. 
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Abstract Collagen synthesis involves many post-translational modifications, the collagen- 
specific intracellular modifications consisting of hydroxylation of certain proline and İysine 
residues to 4-hydroxyproline, 3-hydroxyproline and hydroxylysine, and glycosylation of 
some of the hydroxylysine residues to galactosylhydroxylysine and glucosylgalactosylhy- 
droxylysine. The five enzymes catalyzing these modifications are collagen prolyl 4-hydroxy- 
lase, prolyl 3-hydroxylase, lysyl hydroxylase, collagen galactosyltransferase and collagen 
glucosyltransferase, all residing within the lumen of the endoplasmic reticulum. Vertebrate 
collagen prolyl 4-hydroxylase, prolyl 3-hydroxylase and lysyl hydroxylase have at least three 
isoenzymes, and all collagen hydroxylases belong to the group of 2-oxoglutarate dioxyge- 
nases, which require Fe’, 2-oxoglutarate, O, and ascorbate. Although the three-dimensional 
structures of these enzymes are still unknown, detailed information is available on their cat- 
alytically critical residues and reaction mechanisms. 4-Hydroxyproline residues have an 
essential role in providing the collagen triple helices with thermal stability, and collagen pro- 
lyl 4-hydroxylase is therefore regarded as an attractive target for chemical inhibition to con- 
trol excessive collagen accumulation, e.g. in fibrotic diseases and cases of severe scarring. 


Keywords Collagen - Collagen prolyl 4-hydroxylase - Prolyl 3-hydroxylase - 
Lysyl hydroxylase - Collagen glycosyltransferase 
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1 
Introduction 


Collagen synthesis involves an unusually large number of co-translational and 
post-translational modifications, many of which are unique to collagens and 
other proteins with collagen-like domains. The aim of this chapter is to review 
current information on the collagen-specific enzymes involved in the intra- 
cellular modification steps, which include hydroxylation of certain proline and 
lysine residues to 4-hydroxyproline, 3-hydroxyproline and hydroxylysine, and 
glycosylation of some of the hydroxylysine residues to galactosylhydroxylysine 
and glucosylgalactosylhydroxylysine (Fig. 1) [1-4]. These steps are catalyzed by 
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<— Protein disulfide isomerase 
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2.4 Assembly of triple helix 


Secretion of procollagen 


Fig.1 The main intracellular steps in the synthesis of a fibril-forming collagen. The colla- 
gen chains are synthesized on membrane-bound ribosomes and secreted into the lumen of 
the endoplasmic reticulum. The collagen-specific modifications include hydroxylation of 
certain proline and lysine residues to 4-hydroxyproline, 3-hydroxyproline and hydroxyly- 
sine, and glycosylation of some of the hydroxylysine residues to galactosylhydroxylysine and 
glucosylgalactosyIhydroxylysine. Certain asparagine residues in the C propeptides, or both 
the N and C propeptides, are also glycosylated by reactions similar to those in many other 
proteins. Association of the three collagen chains is directed in a type-specific manner by 
recognition sequences in their C propeptides, and the formation of intrachain and interchain 
disulfide bonds is catalyzed by protein disulfide isomerase. The triple-helical domain then 
nucleates at its C-terminal end and the triple helix is propagated in a zipper-like fashion 
towards the N terminus 
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five specific enzymes: collagen prolyl 4-hydroxylase, prolyl 3-hydroxylase, 
lysyl hydroxylase, collagen galactosyltransferase and collagen glucosyltrans- 
ferase, all residing within the lumen of the endoplasmic reticulum (ER) [1-4]. 
The collagen hydroxylases, i.e. collagen prolyl 4-hydroxylase, prolyl 3- 
hydroxylase and lysyl hydroxylase, catalyze the formation of 4-hydroxyproline, 
3-hydroxyproline and hydroxylysine almost exclusively in -X-Pro-Gly-, -Pro- 
4HyP-Gly- and -X-Lys-Gly-sequences, respectively (the repeating -Gly-X-Y-se- 
quences that are typical of collagens and collagen domains in other proteins are 
written here as -X-Y-Gly- because of the hydroxylation properties of the three 
hydroxylases; see below) [5, 6]. Collagen prolyl 4-hydroxylase and lysyl hy- 
droxylase were cloned about 15 years ago, while prolyl 3-hydroxylase was 
cloned only in 2004, all the human enzymes having at least three isoenzymes 
[4-8]. Collagen galactosyltransferase and collagen glucosyltransferase have not 
been cloned yet, although one of the lysyl hydroxylase isoenzymes has also 
been shown to possess low amounts of the collagen glycosyltransferase 
activities [9-11]. 


2 
Occurrence and Functions of 4-Hydroxyproline, 3-Hydroxyproline 
and Hydroxylysine in Animal Proteins 


Most of the 4-hydroxyproline, 3-hydroxyproline and hydroxylysine residues in 
animal proteins are found in the -X-4Hyp-Gly-, -3 Hyp-4Hyp-Gly- and -X-Hyl- 
Gly- sequences of collagens. 4-Hydroxyproline, and hydroxylysine in most but 
not all cases, are also present in more than 20 additional proteins with colla- 
gen-like triple-helical domains, including the subcomponent Clq of comple- 
ment, a Clq-like factor, adiponectin, at least eight collectins and three ficolins 
(humoral lectins of the innate immune defence system), the tail structure of 
acetylcholinesterase, three macrophage receptors, ectodysplasin, two EMILINS 
(elastic fibre-associated glycoproteins) and a src-homologous-and-collagen 
protein [3, 4, 6]. 

4-Hydroxyproline residues have an essential role in providing the collagen 
triple helices with thermal stability. The denaturation temperature of a non-hy- 
droxylated type I collagen is only 24 °C, while a triple helix consisting of fully 
hydroxylated collagen polypeptide chains is stable up to 39 °C [12, 13]. Non-hy- 
droxylated collagen polypeptide chains thus cannot form functional triple-he- 
lical molecules in vivo, and almost complete hydroxylation of the Y-position 
proline residues of the -X-Y-Gly- triplets is required for the generation of a col- 
lagen molecule that is stable at 37 °C. The stabilizing effect of 4-hydroxyproline 
residues on the collagen triple helix is most likely due to the inductive effects 
of the electron-withdrawing hydroxy group of 4-hydroxyproline residues on 
the pyrrolidine ring pucker, the 0 and y torsional angles and the peptide bond 
trans/cis ratio of the substituted proline [14]. The hydroxylysine residues of col- 
lagens have two important functions: their hydroxyl groups serve as attach- 
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ment sites for carbohydrate units, and they are essential for the stability of the 
intermolecular cross-links [6]. The function of 3-hydroxyproline residues is not 
yet well understood, but it has recently been suggested that they may modulate 
the stability of the triple helix, allowing for specific regions of lower stability 
that may be necessary for the assembly of certain supramolecular structures, 
e.g. the meshwork structure formed by type IV collagen in basement mem- 
branes [15, 16]. 

Various collagen types show only relatively small, but distinct differences 
in their 4-hydroxyproline content [17]. A typical example is the most abun- 
dant collagen, type I, which contains about 100 4-hydroxyproline residues per 
1000 amino acids, i.e. about 50% of the incorporated proline residues are 
hydroxylated [17]. In contrast, the 3-hydroxyproline and hydroxylysine con- 
tents of different collagen types vary markedly, the values ranging from 0 
to over 10 residues per 1000 amino acids in the case of 3-hydroxyproline, 
and from about 5 to 70 in that of hydroxylysine [17]. Further variations in 
the amounts of 4-hydroxyproline, 3-hydroxyproline and hydroxylysine are 
found within the same collagen type in different tissues and even in the same 
tissue in various physiological and pathological states [6, 17]. Because of the 
critical function of the 4-hydroxyproline residues, the amount of this residue 
in a certain collagen type varies only within narrow limits, whereas large 
variations are found in 3-hydroxyproline and hydroxylysine. The 3-hydroxy- 
proline content of type IV collagen, for example, can vary from about 1 to 20 
per 1000 residues, and the hydroxylysine content of type I collagen from about 
6 to 17/1000 [17]. 

4-Hydroxyproline is also found in -Gly-X-Y- sequences of elastin, the main 
component of elastic fibres [6, 17]. Elastin differs from collagens and proteins 
with collagen-like domains, however, in that its -Gly-X-Y- repeats do not form 
triple-helical structures. Although 4-hydroxyproline and hydroxylysine were 
long thought to be almost exclusively present in the Y positions of repeating 
-X-Y-Gly- sequences (the repeating collagenous -Gly-X-Y- sequences are again 
written here as -X-Y-Gly- because of the hydroxylation properties of the cor- 
responding hydroxylases; see below), some exceptions were already recognized 
quite early on. 4-Hydroxyproline has been identified in single -X-4Hyp-Ala- 
triplets in the a3 chain of type IV collagen and in two of the polypeptide chains 
of the subcomponent C1q of human complement, and the telopeptides in some 
fibril-forming collagens contain hydroxylysine in one -X-Hyl-Ala- or -X-Hyl- 
Ser- sequence [6, 17]. In addition, 4-hydroxyproline and hydroxylysine are 
found in some proteins that contain a single -X-Y-Gly- sequence. For example, 
the kinin mixture in human plasma, urine and ascitic fluid contains both lysyl 
bradykinin and small amounts of hydroxyproline-lysyl-bradykinin with a sin- 
gle 4-hydroxyproline residue in the sequence -Pro-4Hyp-Gly- 118, 19]. A single 
4-hydroxyproline residue is also found in an -Arg-4Hyp-Gly- sequence in the 
hydroxyproline luteinizing hormone-releasing hormone [20]. Correspondingly, 
anglerfish somatostatin-28 has a single hydroxylysine residue in the sequence 
-Trp-Hyl-Gly- [21, 22]. 
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Most invertebrate collagens resemble the vertebrate collagens and have 
roughly similar 4-hydroxyproline contents in their triple-helical domains [6, 17]. 
Earthworm cuticle collagen is unique among all collagens studied so far, how- 
ever, in that more than 9096 of the incorporated proline residues are found in 
the form of 4-hydroxyproline, while the other extreme is represented by Ascaris 
cuticle collagen in which only about 596 of the proline residues are hydroxy- 
lated [6, 17]. Furthermore, most of the 4-hydroxyproline residues in earthworm 
cuticle collagen are found in the X positions of the repeating -X-Y-Gly- se- 
quences [6, 17]. 

3-Hydroxyproline has been identified in collagens only in the sequence -Gly- 
3Hyp-4Hyp-Gly-. It has also been found in several proteins of the parasitic 
trematode Fasciola hepatica, including secreted cathepsin L-like proteinases, 
but not collagen. The 3-hydroxyproline residues of the F. hepatica proteins are 
present in sequences showing no consensus amino acid sequence [23, 24]. 

4-Hydroxyproline residues have very recently been identified in the hy- 
poxia-inducible transcription factor HIF, where they have been shown to have 
a novel function in the means by which HIF controls gene expression in re- 
sponse to changes in the amount of cellular oxygen 125, 26]. HIF is an af het- 
erodimer in which the stability of the o subunit is regulated in an oxygen-de- 
pendent manner. The HIF-a subunit is synthesized continuously, and one or 
both critical proline residues in two -Leu-X-X-Leu-Ala-Pro- sequences are hy- 
droxylated under normoxic conditions [25, 26]. Hydroxylation of HIF-a is not 
catalyzed by collagen prolyl 4-hydroxylases [26] but by a novel cytoplasmic HIF 
prolyl 4-hydroxylase family [27-29], the resulting 4-hydroxyproline residue be- 
ing essential for the binding of HIF-a to the von Hippel-Lindau E3 ubiquitin 
ligase complex and for rapid subsequent proteasomal degradation in normoxia 
[25, 26]. Under hypoxic conditions the oxygen-requiring hydroxylation step is 
prevented, HIF-a escapes degradation and dimerizes with HIF-D. The dimer is 
then translocated into the nucleus and becomes bound to the HIF-responsive 
elements in a number of hypoxia-inducible genes, such as those for erythro- 
poietin, vascular endothelial growth factor and glycolytic enzymes [25, 26]. A 
striking difference in the catalytic properties of the HIF and collagen prolyl 4- 
hydroxylases is that the K,, values of the HIF prolyl 4-hydroxylases for oxygen 
are very high, even slightly above the concentration of dissolved O, in air, while 
the K,, values of collagen prolyl 4-hydroxylases are about one-sixth of these 
[30]. These data are consistent with the functions of the two classes of prolyl 4- 
hydroxylase. The HIF prolyl 4-hydroxylases are effective oxygen sensors that 
are inhibited by even a small decrease in O; concentration, whereas the colla- 
gen prolyl 4-hydroxylases must be able to function in situations with low O; 
concentrations, as in wounds and tissues of low vascularity. 
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3 
Collagen Hydroxylases 


3.1 
Collagen Prolyl 4-Hydroxylases 


3.1.1 
Vertebrate Collagen Prolyl 4-Hydroxylases 


Collagen prolyl 4-hydroxylase was first purified to near homogeneity more 
than 30 years ago from chick embryos by conventional procedures. The devel- 
opment of two affinity purification methods and various recombinant expres- 
sion systems has subsequently facilitated the isolation of large quantities of 
pure enzyme [5-7, 31]. 

The collagen prolyl 4-hydroxylases from all vertebrate sources studied so far 
are a, tetramers (Fig. 2) in which the two catalytic sites are located in the a 
subunits, while the D subunits are identical to the multifunctional enzyme and 
chaperone protein disulfide isomerase (PDI) [5-7]. The molecular weight of the 
collagen prolyl 4-hydroxylase tetramer is about 240 kDa, those of the a and p 
subunits being about 63 kDa and 58 kDa, respectively [5-7]. The monomeric 
subunits possess no hydroxylase activity, and all attempts to assemble an active 
enzyme tetramer from the dissociated monomers in vitro have been unsuc- 
cessful [5-7]. Active recombinant collagen prolyl 4-hydroxylases have been 
successfully produced in insect [32], yeast [33] and plant cells [34], however, by 
coexpressing the o and f subunits. This has made it possible to study the mol- 
ecular and functional properties of various collagen prolyl 4-hydroxylases in 
detail and to develop high-level recombinant expression systems for hydroxy- 
lated human collagens besides in mammalian cells also in insect cells, yeasts 
and plants [33-38]. 

In addition to its critical role in the hydroxylation of collagen chains, and 
thus in the generation of stable collagen triple helices, collagen prolyl 4-hy- 
droxylase also plays an important role in collagen molecule quality control. The 


Vertebrates C. elegans B. malayi D. melanogaster 
Type I Type II Type III 
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Fig.2 Schematic representation of the forms of collagen prolyl 4-hydroxylase characterized 
in various species. The molecular composition of the active enzyme formed by the C. ele- 
gans PHY-3 and C. elegans PDI-1 is currently unknown 
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enzyme forms a stable association with non-helical collagen chains that is not 
dependent on the extent of their hydroxylation and retains the non-helical 
trimers within the ER [39]. Once the folding of the triple-helical domain is 
complete, the enzyme dissociates rapidly and releases the molecule for further 
transport [39]. 


3.1.1.1 
The Catalytic o Subunit 


The vertebrate o subunit was first cloned from human [40] and chicken [41] in 
1989, and subsequently from the rat [42] and mouse [43]. Collagen prolyl 4-hy- 
droxylase was long assumed to be of one type only, with no isoenzymes, but 
two additional o subunit isoforms, designated a(II) and a(III), have now been 
cloned and characterized from human and rodent sources [43-46]. Corre- 
spondingly, the a subunit first identified is now called a(I). The a(II) and a(III) 
subunits also become assembled into of tetramers with PDI (Fig. 2), and the 
[x(I)]2B2, [a (II) B5 and [a(1II) ;B, tetramers are referred to as type I, II and 
III collagen prolyl 4-hydroxylases, respectively [43-46]. Insect cell coexpression 
experiments have suggested that it is highly unlikely that vertebrate o subunits 
would form mixed tetramers containing two kinds of catalytic subunit [44]. 

The type I collagen prolyl 4-hydroxylase is the main form in most vertebrate 
cell types and tissues, but the type II enzyme is the major form in chondrocytes, 
osteoblasts, endothelial cells and cells in epithelial structures [47,48]. The type II 
enzyme represents at least 70% and 80% of the total prolyl 4-hydroxylase ac- 
tivity in cultured mouse chondrocytes and cartilage, respectively [47], and it 
may thus have a major role in the development of cartilage and cartilagenous 
bone. The type III collagen prolyl 4-hydroxylase is expressed in many human tis- 
sues, but at much lower levels than the type I and type II enzymes [45]. 

The human a(I), a(II) and ol) subunits consist of 517, 514 and 525 
residues, respectively, with signal peptides of additional 17, 21 and 19 residues 
(Fig. 3) [40, 44-46]. The overall amino acid sequence identity between the 
processed human a(I) and a(II) subunits is 6596, and those between a(I) and 
o (III) and between a(II) and a(III) are 35-3796 144, 45]. The highest degree of 
identity is seen within the catalytically important C-terminal region [5-7, 49], 
the 120 C-terminal residues of the human a(I) subunit being 80% identical to 
those of human a(II), while a (III) is 56-57% identical to a(1) and e (II) in this 
region [44, 45]. All four critical residues at the catalytic site, which will be dis- 
cussed in more detail below, are conserved in all three o subunits (Fig. 3). 

The peptide-substrate-binding domain of the collagen prolyl 4-hydroxylases 
is distinct from the catalytic C-terminal domain and is located between residues 
Phe144 and Ser244 in the human a(1) subunit (Fig. 3) [50]. Recent nuclear mag- 
netic resonance, surface plasmon resonance and isothermal titration calorime- 
try studies have indicated that many characteristic features of the binding of 
peptides to collagen prolyl 4-hydroxylases (see below) can be explained by the 
properties of binding to this domain rather than the catalytic domain [51]. 
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Fig.3 Schematic representation of the human collagen prolyl 4-hydroxylase a(I), a(II) 
and a(III) subunits, the C. elegans PHY-1, PHY-2 and PHY-3 polypeptides, the B. malayi and 
O. volvulus PHY-1 polypeptides, and the D. melanogaster o(1) subunit. Numbering of the 
amino acids starts with the first residue of the processed polypeptide. Cysteine residues and 
potential attachment sites for asparagine-linked oligosaccharides are shown below the 
polypeptides, and the catalytically critical residues are shown above them. The peptide-sub- 
strate-binding domains in the human o subunits are also indicated 
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The human of, a(II) and a(III) subunits have five conserved cysteine 
residues, the a (II) and a(III) subunits each having one additional cysteine be- 
tween the conserved cysteines 4 and 5, and 1 and 2, respectively [40, 44, 45] 
(Fig. 3). The collagen prolyl 4-hydroxylase tetramer has no interchain disulfide 
bonds between the subunits [52], but site-directed mutagenesis studies on the 
a(I) subunit have indicated that intrachain disulfide bonds that are essential for 
enzyme tetramer assembly are formed between the second and third conserved 
cysteines and between the fourth and fifth (Fig. 3) 153, 54]. The human of), 
a(II) and a(III) subunits have two N-glycosylation sites, the position of the 
more N-terminal site being conserved between the a(I) and a(II) subunits, 
while the other positions are not conserved (Fig. 3) [44, 45]. Site-directed mu- 
tagenesis studies of the human a(I) subunit and deglycosylation studies of the 
type III enzyme have shown that glycosylation has no role in the assembly of 
the enzyme tetramer or its catalytic activity [45, 54]. 

The genes encoding the human a(1), a (II) and a(III) subunits are present on 
chromosomes 10q21.3-23.1, 5331 and 119812, respectively, and have very sim- 
ilar exon-intron organizations [45, 55-57]. Despite this similarity, two forms of 
a(I) and a(II) mRNA have been described, resulting from mutually exclusive 
alternative splicing that affects different homologous exons, nos. 9 and 10 in the 
case of the a(I) gene, and 12a and 12b in that of the a(II) gene, whereas no ev- 
idence has been found for alternative splicing of the a(IIT) transcript 140, 45, 
56, 57]. 

No human heritable diseases have been identified so far that are caused by 
mutations in the collagen prolyl 4-hydroxylase « subunit genes. Knock-out 
mice have recently been generated for the a(I) (Holster T, Pakkanen O, Soininen 
R, Sormunen R, Kivirikko KI, Myllyharju J, unpublished data) and a(II) sub- 
units (Pakkanen O, Holster T, Soininen R, Sormunen R, Kivirikko KI, Mylly- 
harju J, unpublished data). a (I) Knock-out causes embryonic lethality, whereas 
o (II) knock-out mice are born with no obvious phenotypic abnormalities. 


3.1.1.2 
The Multifunctional B Subunit 


Cloning of the $ subunit of human collagen prolyl 4-hydroxylase showed, sur- 
prisingly, that it is identical to another enzyme, protein disulfide isomerase 
(PDI) [58], an abundant protein within the ER that catalyzes disulfide bond for- 
mation and rearrangement during protein folding [5-7, 59]. PDI is a multi- 
functional polypeptide which serves as the B subunit in all currently known 
vertebrate collagen prolyl 4-hydroxylases and in the microsomal triglyceride 
transfer protein dimer, and as a chaperone-like polypeptide that binds various 
newly synthesized polypeptides within the lumen of the ER and probably as- 
sists in their folding [5-7, 59]. 

The human PDI polypeptide consists of 491 residues and a signal peptide of 
17 additional residues [58]. It is a modular protein composed of four domains, 
a,b,b’anda’,anda highly acidic extension, c (Fig. 4) [5-7,58,59]. Thea and 


124 J. Myllyharju 


CGHC CGHC 
a b b” a” c 


Fig. 4 Schematic representation of the domain structure of the human PDI polypeptide. 
Numbering of the amino acids starts with the first residue of the processed polypeptide. The 
two -Cys-Gly-His-Cys- sequences representing the catalytic sites for PDI activity are indi- 
cated. Modified from [5] with permission from Elsevier 


a” domains each contain a catalytic site for PDI activity, with a -Cys-Gly-His- 
Cys- motif, and are similar in sequence to thioredoxin [5-7, 58-60], while the 
homologous b and b’ domains do not have catalytic site sequences and show 
no sequence similarity to thioredoxin [5-7, 58, 59]. NMR characterization of 
domains a, b, and a’ indicates that they all have a thioredoxin fold, however 
[61-64]. PDI thus most probably consists of two catalytically active thioredoxin 
modules and two inactive ones [62]. 

PDI has at least two major functions in vertebrate collagen prolyl 4-hy- 
droxylase tetramers. Its C terminus contains the -Lys-Asp-Glu-Leu motif, which 
is both necessary and sufficient for retention of the polypeptide, and hence also 
the collagen prolyl 4-hydroxylase tetramer, within the lumen of the ER [65]. 
When an enzyme tetramer is dissociated by various means [5,6], or when the 
vertebrate o subunit is expressed alone in foreign host cells without PDI, the a 
subunit forms insoluble aggregates with no prolyl 4-hydroxylase activity 
[32, 33,66]. Thus, PDI has an important function in keeping the o subunits in 
solution. Another chaperone, BiP, also forms soluble complexes with the o sub- 
unit, but with no prolyl 4-hydroxylase activity [67,68]. Therefore, the function 
of PDI in the collagen prolyl 4-hydroxylase tetramer is not only that of keep- 
ing the a subunits in solution but is more specific, most likely that of keeping 
them in a catalytically active, non-aggregated conformation. Site-directed mu- 
tagenesis studies have shown that the disulfide isomerase activity of PDI is not 
required for the assembly or activity of the collagen prolyl 4-hydroxylase 
tetramer [65]. Likewise, the C-terminal extension c is not required for tetramer 
assembly [69], the minimum requirement being fulfilled by a pair of PDI do- 
mains, b’a’ [70]. 

Besides its critical role as the f subunit of collagen prolyl 4-hydroxylase, PDI 
has additional important roles in collagen synthesis. It catalyzes intrachain 
disulfide bond formation in the N and C propeptides of procollagens and sta- 
bilizes procollagen trimers through the formation of interchain disulfide bonds 
between the C propeptides, and between the C telopeptides in some specific 
cases [1-4]. Furthermore, PDI interacts specifically with the C propeptides 
prior to trimer formation, retains unassembled chains within the ER, and thus 
directly prevents secretion until the correct triple-helical structure has been 
achieved [71, 72]. 
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3.1.2 
Non-Vertebrate Collagen Prolyl 4-Hydroxylases 


Two major collagen families are present in the nematode Caenorhabditis ele- 
gans, the cuticle collagens and basement membrane collagens. These are en- 
coded by a large gene family of about 180 members, three of them encoding 
basement membrane collagens and the rest cuticle collagens [4, 73, 74]. The 
C. elegans genome contains two conserved genes, phy-1 and phy-2, encoding 
polypeptides with a high sequence similarity to the vertebrate collagen prolyl 
4-hydroxylase a subunits, while the products of two additional genes, encoding 
o subunit-like polypeptides, show a lower similarity [75]. The phy-1 and phy-2 
genes, and one with lower similarity, phy-3, have been cloned and characterized 
[76-81], while characterization of the other gene with lower similarity, phy-4, is 
in progress (Keskiaho K, Kukkola L, Page AP, Winter AD, Nissi R, Myllyharju J, 
unpublished data). PDI has two isoforms in C. elegans, PDI-1 and PDI-2, both 
having disulfide isomerase activity and PDI-2 also serving as the D subunit in 
the forms of collagen prolyl 4-hydroxylase that are involved in the synthesis of 
cuticle collagens [74, 77, 80, 82]. The PHY-1, PHY-2 and PDI-2 polypeptides are 
expressed in the cuticle collagen-synthesizing hypodermal cells at times of max- 
imal collagen synthesis in larval stages 1-4 and in adult nematodes [77, 80]. 

The C. elegans PHY-1 and PHY-2 polypeptides consist of 543 and 523 residues, 
respectively, with signal peptides of 16 additional residues (Fig. 3) [76-78]. The 
degree of amino acid sequence identity between PHY-1 and PHY-2 is 5496, 
while that between PHY-1 and PHY-2 and the human a(I) and a(II) subunits 
is 42-46% [76, 77]. Surprisingly, recombinant expression of the C. elegans PHY- 
1 polypeptide in insect cells together with human PDI or C. elegans PDI-2 led 
to the assembly of an active af dimer instead of an of); tetramer (Fig. 2) [76, 
82]. Recombinant expression and in vivo studies showed, however, that the 
main collagen prolyl 4-hydroxylase form in C. elegans is a unique PHY-1/PHY- 
2/(PDI-2); mixed tetramer, along with a small amount of the PHY-1/PDI-2 
dimer, while the PHY-2/PDI-2 dimer was not detected (Fig. 2) [80]. Homozy- 
gous inactivation of either the phy-1 or phy-2 gene prevents assembly of the 
mixed tetramer, but the mutants can in part compensate for its absence by in- 
creased assembly of the corresponding PHY/PDI-2 dimer [80]. The phy-1 mu- 
tants do this only very ineffectively, however, and have a short, fat phenotype, 
dumpy, whereas the phy-2 mutants have a wild-type phenotype [80]. The phy- 
IT, phy-2”" double null 177, 78, 80] and the pdi-27” null [77] mutants lack all 
the collagen prolyl 4-hydroxylase forms needed for the synthesis of cuticle col- 
lagens and are therefore embryonically lethal. 

The third C. elegans o subunit homologue characterized, PHY-3, encodes a 
polypeptide of only 295 residues, with a signal peptide of 23 additional residues 
(Fig. 3) [81]. It shows a 17-2096 amino acid sequence identity to the about 290 
C-terminal residues in the vertebrate o subunits and the C. elegans PHY-1 and 
PHY-2 polypeptides [81]. The recombinant PHY-3 polypeptide does not asso- 
ciate with C. elegans PDI-2, but instead has been shown in recombinant coex- 
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pression experiments to form an active collagen prolyl 4-hydroxylase with C. 
elegans PDI-1 that does not associate with PHY-1 or PHY-2 [80, 81]. The mol- 
ecular composition of the active enzyme formed by PHY-3 and PDI-1 is still 
unknown, however, and it is also possible that PHY-3 may be a monomeric en- 
zyme requiring PDI-1 only as a chaperone to assist in its folding (Fig. 2) [81]. 
Homozygous phy-3 null nematodes are fertile and have a wild-type phenotype 
with a normal 4-hydroxyproline content in the cuticle, but the 4-hydroxypro- 
line content of early mutant embryos was reduced by approximately 90% [81]. 
The phy-3 gene is expressed in embryos, late larval stages and adult nematodes, 
its expression in adults being restricted to the spermatheca, a specialized region 
of the gonad where oocytes are fertilized [81]. PHY-3 therefore has no role in 
the synthesis of cuticle collagens but is likely to be involved in the hydroxyla- 
tion of proline residues in early embryos, probably those in the eggshell colla- 
gens [81]. 

Collagen prolyl 4-hydroxylase o subunits have also been cloned and charac- 
terized from two parasitic filarial nematodes, Onchocerca volvulus and Brugia 
malayi (Fig. 3), the enzyme in the latter being a highly unusual o, homotetramer 
that is soluble and active without PDI (Fig. 2) [83,84]. Small-molecule inhibitors 
of collagen prolyl 4-hydroxylase (see below) have similar cuticle-specific dele- 
terious effects in both C. elegans and B. malayi, and thus this enzyme is an 
excellent target for the control of parasitic nematodes by chemical inhibition 
[78, 80, 83]. 

The collagen gene family of Drosophila melanogaster contains only four 
members, coding for two o chains of type IV collagen, one homologue of type 
XV and XVIII collagens and pericardin, a protein in which the collagen domain 
shows some similarity to type IV [4, 85-87]. It is therefore highly surprising 
that approximately 20 genes encoding polypeptides of 480-550 residues with 
a similarity to the vertebrate collagen prolyl 4-hydroxylase o subunits exist in 
the D. melanogaster genome [88]. Only one of the encoded polypeptides has 
been characterized in detail [89], and this consists of 516 residues with a sig- 
nal peptide of an additional 19 residues and shows 34-35% and 31% amino 
acid sequence identities to the vertebrate o subunits and C. elegans PHY-1, re- 
spectively (Fig. 3) [89]. This D. melanogaster o subunit assembles into an ac- 
tive collagen prolyl 4-hydroxylase tetramer with PDI in recombinant expres- 
sion (Fig. 2) [89]. Many of the D. melanogaster genes encoding « subunit-like 
polypeptides show tissue-specific and developmental stage-specific expression 
[88, 89]. The D. melanogaster collagen prolyl 4-hydroxylase family appears to 
be markedly larger than the corresponding vertebrate and C. elegans families, 
even though the number of collagen genes in D. melanogaster is much smaller. 
It may therefore be that many of the D. melanogaster prolyl 4-hydroxylases hy- 
droxylate proline residues in proteins other than the collagens, and detailed 
studies on these enzymes may lead to the identification of additional functions 
for the human prolyl 4-hydroxylases as well. 
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3.2 
Lysyl Hydroxylase 


Lysyl hydroxylase has been purified to homogeneity from chick embryos and 
human placental tissues by two affinity column procedures [6, 31]. The active 
enzyme is a homodimer with a molecular weight of about 170 kDa, consisting 
of two identical monomers with a molecular weight of about 85 kDa. Lysyl 
hydroxylase is located in the lumen of the ER, but unlike the collagen prolyl 
4-hydroxylases, it is not a soluble ER luminal protein but a luminally-oriented 
peripheral membrane protein [6]. A fully active recombinant lysyl hydroxylase 
can be efficiently produced in insect cells, and the use of a baculoviral signal 
peptide leads to efficient secretion of the enzyme into the culture medium, 
from which it can be easily purified [90-92]. 


3.2.1 
Vertebrate Lysyl Hydroxylases 


Vertebrate lysyl hydroxylase was first cloned from chicken and later from hu- 
man, rat and mouse sources [93-97]. Many early findings suggested the pos- 
sible existence of tissue-specific lysyl hydroxylase isoenzymes, e.g. the large 
differences in the extent of lysine hydroxylation found between collagen types, 
and even within the same collagen type from different tissues, and the wide 
variation in collagen hydroxylysine deficiency in different tissues of patients 
with the kyphoscoliotic type of Ehlers-Danlos syndrome (see below). Like 
collagen prolyl 4-hydroxylases, lysyl hydroxylase is now also known to form an 
enzyme family, since two novel isoenzymes, lysyl hydroxylases 2 and 3, have 
recently been cloned and characterized [98-100]. 

The human lysyl hydroxylase 1,2 and 3 polypeptides consist of 709,712 and 
714 residues, respectively, with signal peptides of additional 18, 25 and 24 
residues (Fig. 5) [94, 95, 98-100]. The overall amino acid sequence identity be- 
tween the processed human lysyl hydroxylase 1 and 2 polypeptides is 75%, and 
that between lysyl hydroxylase 3 and the other two isoenzymes 57-5996 
[98-100]. Identity is highest within the catalytically important C-terminal re- 
gion, being over 9096 between lysyl hydroxylases 1 and 2, and 69-7296 between 
lysyl hydroxylase 3 and the other two, all four catalytically critical residues be- 
ing conserved (Fig. 5, see below) [98-100]. The vertebrate collagen prolyl 4-hy- 
droxylase o subunits and lysyl hydroxylases show no significant amino acid se- 
quence similarity with the exception of the catalytically critical residues. 
Further variation within the lysyl hydroxylase family is caused by alternative 
splicing. A novel form of human lysyl hydroxylase 2, termed lysyl hydroxylase 
2b, is a 733-residue polypeptide, the increase in size being caused by sequences 
encoded by an additional exon, 13A, that is located between exons 13 and 14 in 
the originally described lysyl hydroxylase 2a cDNA [101]. 

The three human lysyl hydroxylase isoenzymes contain several conserved 
cysteine residues that may be structurally important (Fig. 5) [98-100]. All three 
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Fig.5 Schematic representation of the human lysyl hydroxylase 1, 2 and 3 and C. elegans 
lysyl hydroxylase polypeptides. Numbering of the amino acids starts with the first residue 
of the processed polypeptide. Cysteine residues and potential attachment sites for as- 
paragine-linked oligosaccharides are shown below the polypeptides, and the catalytically 
critical residues are shown above them. The domain structures of the human lysyl hydrox- 
ylase polypeptides are indicated by A, B and C 


have potential asparagine-linked glycosylation sites, isoenzymes 1,2 and 3 hav- 
ing four, seven and two sites, respectively (Fig. 5) [94, 95, 98-100]. A consider- 
able heterogeneity is found in the glycosylation of human lysyl hydroxylase 1 
[6], and site-directed mutagenesis studies have shown that some of the as- 
paragine-linked carbohydrate units in a recombinant polypeptide are required 
for maximal enzyme activity [90]. Although the lysyl hydroxylases reside 
within the lumen of the ER, they do not contain a traditional ER retention mo- 
tif, and it has been suggested that they are bound to the ER membrane by weak 
electrostatic interactions [102]. The 40 C-terminal residues of the lysyl hy- 
droxylase 1 polypeptide have been shown to be necessary for its membrane as- 
sociation and localization in the ER [103, 104]. 

Limited proteolysis experiments on recombinant lysyl hydroxylase isoen- 
zymes have indicated that the polypeptides consist of three domains A-C (from 
the N to C terminus), with molecular weights of approximately 30, 37 and 
16 kDa (Fig. 5) [10]. The N-terminal domain A has no role in lysyl hydroxylase 
activity, as a recombinant B-C polypeptide was found to represent a fully active 
lysyl hydroxylase with catalytic properties identical to those of the full-length 
enzyme [10]. Lysyl hydroxylase 3, but not the other two isoenzymes, has also 
been reported recently to have small amounts of collagen glucosyltransferase 
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and galactosyltransferase activity [9-11], these activities residing entirely in the 
N-terminal domain A [10]. 

The human genes encoding lysyl hydroxylases 1,2 and 3 are located on chro- 
mosomes 1p36.2-1p36.3, 3q23-q24 and 7436, respectively 194, 100, 105]. Their 
mRNAs are expressed in a variety of human tissues, the highest expression lev- 
els for lysyl hydroxylase 1 mRNA being found in the liver and skeletal muscle 
(106), those for lysyl hydroxylase 2 mRNA in the pancreas, placenta, heart and 
skeletal muscle (98), and those for lysyl hydroxylase 3 in the pancreas, placenta, 
heart and spinal cord [99, 100]. No clear differences in tissue specificity or col- 
lagen type specificity have so far been identified between the three isoenzymes, 
but two mutations in the lysyl hydroxylase 2 gene have recently been reported 
in the autosomally recessive Bruck syndrome, which involves underhydroxy- 
lation of lysine residues within the telopeptides of type I collagen in bone and 
is characterized by fragile bones, joint contractures, scoliosis and osteoporosis 
[107]. This suggests that lysyl hydroxylase 2 may be a telopeptide lysyl hy- 
droxylase with a tissue-specific expression pattern. Furthermore, it has been 
shown very recently that mice lacking lysyl hydroxylase 3 activity die during 
embryogenesis due to a lack of type IV collagen in their basement membranes 
[108]. This phenotype resembles that seen in C. elegans when its only lysyl hy- 
droxylase is inactivated (see below). 

Ehlers-Danlos syndrome is a heterogeneous group of heritable connective 
tissue disorders characterized by articular hypermobility, skin extensibility and 
tissue fragility [109]. The kyphoscoliotic type of Ehlers-Danlos syndrome, 
characterized by scoliosis, generalized joint laxity, skin fragility, ocular mani- 
festations and severe muscle hypotonia, is caused by mutations in the lysyl hy- 
droxylase 1 gene [109, 110]. The most common of these mutations leads to du- 
plication of seven exons due to an Alu-Alu recombination in introns 9 and 16, 
which contain many Alu repeats [106, 111, 112]. The introns of the lysyl hy- 
droxylase 3 gene are shorter than those of the lysyl hydroxylase 1 gene, but they 
also contain numerous Alu repeats [113]. 


3.2.2 
Caenorhabditis elegans Lysyl Hydroxylase 


The nematode C. elegans has a single gene encoding lysyl hydroxylase, let -268 
[99, 114]. The encoded polypeptide consists of 714 residues and a signal pep- 
tide of 16 additional residues 199, 114]. Its sequence shows 45% overall identity 
to those of the human lysyl hydroxylases 1-3, the identity being highest, 65%, 
within the 100 C-terminal residues [99, 114]. Inactivation of the let -268 gene 
leads to the retention of typeIV collagen in the producing cells, which indicates 
that lysyl hydroxylase activity is required for proper processing and secretion 
of type IV collagen [114]. The resulting lack of type IV collagen in the C. ele- 
gans basement membranes leads to separation of the muscle cells from the un- 
derlying epidermal layer upon body wall muscle contraction and failure to 
complete embryogenesis [114]. 
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3.3 
Prolyl 3-Hydroxylases 


Prolyl 3-hydroxylase was purified to 5000-fold and partially characterized 
from a chick embryo extract already more than 25 years ago and its molecu- 
lar weight was estimated to be about 160 kDa [6, 17]. However, prolyl 3-hy- 
droxylase was cloned from chicken only very recently and it was found to be 
a chick homolog of leprecan, also known as growth suppressor 1 [8]. Leprecan, 
which is now called prolyl 3-hydroxylase 1, has two additional family mem- 
bers, prolyl 3-hydroxylases 2 and 3, previously called MLAT4 and GRBC in 
human, mouse and chicken [8]. The lengths of the processed human prolyl 
3-hydroxylase 1, 2 and 3 polypeptides are 736, 708 and 736 amino acids, 
respectively, each having a C-terminal ER retention signal -Lys-Asp-Glu-Leu 
[8]. The sequence identity between the human prolyl 3-hydroxylase 1 and 2 
polypeptides is 4696 and that between the prolyl 3-hydroxylase 2 and 3 
polypeptides 3896. The critical iron and 2-oxoglutarate binding residues are all 
conserved (see below), the 2-oxoglutarate binding residue being an arginine. 
Prolyl 3-hydroxylase 1 is localized specifically to tissues that express fibril- 
forming collagens, suggesting that it may serve to modify these collagens, 
while some of the other two isoenzymes may be involved in modifying the 
type IV basement membrane collagen [8]. 


3.4 
Peptide Substrates of Collagen Hydroxylases 


The collagen hydroxylases act on proline or lysine residues only in peptide link- 
ages, and none of them hydroxylates the corresponding free amino acid. Col- 
lagen prolyl 4-hydroxylase does not act on tripeptides with the structure Gly- 
X-Pro or Pro-Gly-X in vitro, whereas X-Pro-Gly tripeptides are hydroxylated [5, 
6, 17]. Likewise, lysyl hydroxylase does not act on a Lys-Gly-Pro tripeptide, 
while the X-Lys-Gly tripeptides are hydroxylated [6, 17]. Studies with various 
peptides have shown that the minimum sequence requirements for collagen 
prolyl 4-hydroxylase and lysyl hydroxylase are -X-Pro-Gly- and -X-Lys-Gly-, re- 
spectively [6, 17]. In addition, collagen prolyl 4-hydroxylase can hydroxylate the 
tetrapeptides Pro-Pro-Ala-Pro and Pro-Pro-Glu-Pro at a low rate, agreeing with 
the presence of a few -X-4Hyp-Ala- sequences in some collagen polypeptide 
chains and in the subcomponent Clq of complement 16, 17]. Lysyl hydroxylase 
can also hydroxylate the -X-Lys-Ala- and -X-Lys-Ser- sequences found in the 
telopeptides of fibril-forming collagens [6, 17]. Thus, the glycine in the -X-Y- 
Gly- sequence can in some rare cases be replaced by other amino acids. The in- 
teraction of collagen prolyl 4-hydroxylase and lysyl hydroxylase is also affected 
by the amino acid in the X position of the triplet to be hydroxylated and by 
other nearby amino acids [6, 17]. The chain length of the peptide has a major 
effect on the Kn values, which decrease with increasing chain length (Table 1) 
[5-7, 17]. The conformation of the peptide substrate has a crucial effect on 
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Table 1 Ku and K; values of human type I-III collagen prolyl 4-hydroxylases (PAH-I-III) for 
(Pro-Pro-Gly), and poly(L-proline), respectively 


Substrate or inhibitor Constant P4H-I P4H-II P4H-HI 


Kn or K; pmol/l 


(Pro-Pro-Gly); Kn 150-250? 4905 ND: 
(Pro-Pro-Gly) 9 Kn 184 954 20“ 
Protocollagenf Km 0.24 1.13 N.D.* 
Poly(L-proline), M , 5000-7000 K; 0.54 954 30° 
Poly(L-proline), M, 44,000 K, 0.024 204 ND: 


a [31]; > [51]; “Not determined; ? [44]; ° [45]; “A biologically prepared substrate consisting 
of nonhydroxylated procollagen chains of chick type I procollagen. Ref. [31]. 


hydroxylation, in that the triple-helical conformation completely prevents pro- 
line and lysine hydroxylation [5-7, 17]. 

Some distinct differences in peptide-binding properties have been observed 
between the three vertebrate collagen prolyl 4-hydroxylase isoenzymes. The Km 
values of the type I and type III enzymes for the peptide substrate (Pro-Pro- 
Gly),, are quite similar, while that of the type II enzyme is about fivefold higher 
(Table 1) [43-45]. Poly(L-proline) is a highly effective competitive inhibitor of 
the vertebrate type I collagen prolyl 4-hydroxylase, the K; decreasing with in- 
creasing chain length (Table 1) [5-7, 17]. In contrast, the type II enzyme is in- 
hibited by poly(L-proline) only at a much higher concentration, while the 
type III enzyme has intermediate inhibitory properties with respect to poly(1- 
proline) [43-45]. These findings suggest that distinct differences in the struc- 
tures of the peptide-binding sites must exist between the three collagen prolyl 
4-hydroxylase isoenzymes. 

The peptide-substrate-binding domain in the collagen prolyl 4-hydroxylases 
is distinct from the catalytic domain and is located between residues Phe144 
and Ser244 in the human a(I) subunit (Fig. 3) [50]. Site-directed mutagenesis 
studies have shown that most, although not all, of the differences in peptide 
binding between the type I and type II collagen prolyl 4-hydroxylases can be 
attributed to the presence of a glutamate and glutamine in the a(II) subunit po- 
sitions corresponding to Ile182 and Tyr233 in a(I) [50]. The K, values deter- 
mined for the binding of several synthetic peptides to the recombinant a(I) and 
o (II) peptide-substrate-binding domains are very similar to the K,, and K;val- 
ues of these peptides as substrates and inhibitors of the type I and type II en- 
zyme tetramers [51]. The Ką value of the a(I) peptide-substrate-binding do- 
main for a hydroxylated peptide was found to be much higher than that for a 
non-hydroxylated one, indicating a marked decrease in the affinity of hydrox- 
ylated peptides for the domain [51]. It is thus evident that many characteristic 
features of the binding of peptides to the vertebrate collagen prolyl 4-hydroxy- 
lase isoenzymes can be explained by the properties of binding to the peptide- 
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substrate-binding domain rather than the catalytic domain [51]. Whether a 
similar domain exists in the lysyl hydroxylases remains to be established. 

The peptide-substrate-binding domain of the collagen prolyl 4-hydroxylase 
a(I) subunit shows no amino acid sequence similarity to those of other proline- 
rich peptide-binding domains or profilin [50, 115-120]. It consists of five a he- 
lices and one short putative B strand, and thus its secondary structure is also 
distinct from the known structures of other proline-rich peptide-binding do- 
mains and profilin, which consist mainly of D strands [51, 115-120]. The other 
proline-rich peptide-binding modules typically bind their ligand via critical 
aromatic residues located on a hydrophobic path [115-120]. NMR analyses also 
suggest that the a(I) peptide-substrate-binding domain seems to have similar 
characteristic features, since binding of (Pro-Pro-Gly), to this domain caused 
chemical shift changes mainly in hydrophobic residues [51]. The crystal struc- 
ture of the a(I) peptide-substrate-binding domain has recently been solved and 
it has been found to consist of five « helices and belong to a family of teratri- 
copeptide repeat domains that are involved in many protein- protein interac- 
tions [121]. The peptide substrates and the competitive inhibitor poly(L-pro- 
line) are suggested as becoming bound to a groove lined by tyrosines, the 
side-chains of which have a repeat distance similar to that of a poly(L-proline) 
type II helix [121]. 

Prolyl 3-hydroxylase hydroxylates -Pro-4Hyp-Gly- sequences but not -Pro- 
Pro-Gly- sequences [6, 17]. This is in agreement with the existence of 3-hy- 
droxyproline in collagens only in the sequence -Gly-3Hyp-4Hyp-Gly- 16, 17]. As 
in the case of collagen prolyl 4-hydroxylase and lysyl hydroxylase, proline 3-hy- 
droxylation is also affected by the amino acids in the nearby triplets, and by the 
length and conformation of the substrate [6, 17]. 


3.5 
Cosubstrates and Reaction Mechanisms 


Collagen prolyl 4-hydroxylase, prolyl 3-hydroxylase and lysyl hydroxylase be- 
long to the group of 2-oxoglutarate dioxygenases, of which collagen prolyl 4- 
hydroxylase was one of the earliest to be discovered, is most extensively stud- 
ied, and has often been regarded as a model for investigating other enzymes in 
the group [5-7]. All three collagen hydroxylases require Fe?*, 2-oxoglutarate, O, 
and ascorbate (Fig. 6), and their K,, values for these cosubstrates are very sim- 
ilar ( Table 2). 

The 2-oxoglutarate is stoichiometrically decarboxylated during hydroxyla- 
tion, with one atom of the O, molecule being incorporated into the succinate 
and the other into the hydroxy group formed on the proline or lysine residue 
(Fig. 6) [5-7, 17]. Ascorbate is not consumed stoichiometrically, and the en- 
zymes can complete a number of reaction cycles at an essentially maximal rate 
in its absence [5-7, 17]. Hydroxylation then ceases, however, and ascorbate is 
required to reactivate the enzyme [5, 6, 17]. The reaction requiring ascorbate 
is an uncoupled decarboxylation of 2-oxoglutarate, i.e., decarboxylation with- 
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Fig.6 Reactions catalyzed by collagen prolyl 4-hydroxylase, prolyl 3-hydroxylase and lysyl 


hydroxylase 


Table 2 K,, values of human type I-III collagen prolyl 4-hydroxylases (P4H-I-III), human 
lysyl hydroxylases 1 and 3 (LH1 and 3) and chick prolyl 3-hydroxylase (P3H) for cosub- 
strates. The values for collagen prolyl 4-hydroxylases and lysyl hydroxylases have been de- 
termined with synthetic peptides (Pro-Pro-Gly),, and (Ile-Lys-Gly); as substrates, respec- 
tively, whereas those for prolyl 3-hydroxylase have been determined with a protocollagen 
substrate, the values for this enzyme being therefore slightly lower 


Cosubstrate P4H-I P4H-II P4H-III  LHI LH3 P3H* 
Km pmol/l 

Fe? 2* 2^ 0.5* 29 24 2“ 

2-Oxoglutarate 204 22° 20“ 1004 1004 3° 

Ascorbate 3004 340! 370* 3504 3004 120* 

O, 408 N.D.^ N.D.^ 45 $i N.D.^ 30* 


a [49]; ^ [56]; ° [45]; 4 [99]; ° [17]; f [44]; 8 [30]; "Not determined; ‘Chick LH1. 
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Fig.7 Reactions catalyzed by collagen prolyl 4-hydroxylase. 2-Oxoglutarate is stoichiomet- 
rically decarboxylated during the hydroxylation reaction, which does not require ascorbate 
(A). The enzyme also catalyzes the uncoupled decarboxylation of 2-oxoglutarate without 
subsequent hydroxylation of the peptide substrate. Ascorbate is stoichiometrically consumed 
in the uncoupled reaction, which may occur in either the presence (B) or absence (C) of the 
peptide substrate 


Intracellular Post-Translational Modifications of Collagens 135 


out subsequent hydroxylation, in which ascorbate is consumed stoichiometri- 
cally (Fig. 7). The collagen hydroxylases catalyze uncoupled decarboxylation 
cycles even in the presence of saturating concentrations of the peptide sub- 
strates, and in these cycles the reactive iron-oxo complex is probably converted 
to Fe?*-O" rendering the enzymes unavailable for new catalytic cycles until re- 
duced by ascorbate [5-7, 17]. The main biological function of ascorbate in the 
collagen hydroxylase reactions is therefore probably to serve as an alternative 
oxygen acceptor in the uncoupled decarboxylation cycles [5-7, 17]. 

Kinetic studies of the collagen prolyl 4-hydroxylase and lysyl hydroxylase re- 
actions have shown that the cosubstrates and the peptide substrate become 
bound to the enzymes in an ordered manner, Fe?* binding first, followed by 2- 
oxoglutarate, O; and the peptide substrate, and the reaction products are re- 
leased in the reverse order, although Fe?* is not released between most catalytic 
cycles [6, 17]. The catalytic cycle can be divided into two half-reactions: initial 
generation of the reactive hydroxylating species and its subsequent utilization 
for hydroxylation of the target residue (Fig. 8) [5-7, 17]. 

The current stereochemical model for a catalytic site in the collagen hy- 
droxylases suggests that it consists of a set of separate locations for binding of 
the cosubstrates [5-7, 17, 122, 123]. The Fe?* is located in a pocket coordinated 
with three side chains (Fig. 8). Site-directed mutagenesis studies and other data 
have indicated that His412, Asp414, and His483 are the Fe?* binding residues in 
the human a(I) subunit (Figs. 3 and 8), while the corresponding ligands in the 
İysyl hydroxylase 1 polypeptide are His638, Asp640 and His690 (Fig. 5) 149, 54, 
90, 124]. The 2-oxoglutarate binding site can be divided into three subsites 1122, 
123, 125]: subsite I is a positively charged residue of the enzyme that ionically 
binds the C5 carboxyl group of the 2-oxoglutarate, subsite II consists of two cis- 
positioned coordination sites of the enzyme-bound Fe?* and is chelated by the 
C1-C2 moiety, while subsite III involves a hydrophobic binding site in the C3- 
C4 region of 2-oxoglutarate (Fig. 8) [122, 123, 125]. Site-directed mutagenesis 
studies indicate that subsite I is formed by Lys493 in the human collagen pro- 
İyl 4-hydroxylase «(I) subunit and by Arg700 in the corresponding position in 
human lysyl hydroxylase 1 (Figs. 3 and 5) [49, 126]. The three Fe?* binding 
residues are conserved in all the collagen prolyl 4-hydroxylase o subunit iso- 
forms and İysyl hydroxylase isoenzymes characterized so far (Figs. 3 and 5), as 
well as in other 2-oxoglutarate dioxygenases characterized, and a related en- 
zyme, isopenicillin N synthase [27-29, 127-135]. The positively charged residue 
forming subsite I of the 2-oxoglutarate binding site is likewise conserved in po- 
sition +9 or +10 with respect to the second Fe?* binding histidine in all 2-ox- 
oglutarate dioxygenases with the exception of HIF asparaginyl hydroxylase in 
which it is located in position +15 with respect to the first Fe?* binding histi- 
dine [27-29, 127-135]. Determination of the crystal structures of several 2-ox- 
oglutarate dioxygenases and isopenicillin N synthase has verified the critical 
role of these conserved residues in Fe?" and 2-oxoglutarate binding [127-135]. 
Although the overall amino acid sequence similarity between the enzymes is 
very low, their catalytic sites are all located in a jelly-roll motif formed by eight 
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P strands. It is thus probable that a similar jelly-roll core can also be found at 
the catalytic sites of the collagen hydroxylases. 

The His501 residue in the human collagen prolyl 4-hydroxylase a(I) subunit 
(Fig. 3) is an additional critical residue [49, 54], which probably has two roles 
at the catalytic site: it directs the orientation of the C1 carboxyl group of 2-ox- 
oglutarate to the active iron centre, and it accelerates the breakdown of the 
tetrahedral ferryl intermediate to succinate, CO, and a ferryl ion (Fig. 8B) [49]. 
The ascorbate binding site of the collagen hydroxylases probably also contains 
the two cis-positioned coordination sites of the enzyme-bound iron discussed 
above, and is thus partially identical to subsite II of the 2-oxoglutarate binding 
site [136]. 

Molecular oxygen is thought to become bound to the Fe?* end-on in an ax- 
ial position, producing the dioxygen unit, a species characterized by doubly oc- 
cupied orbitals (Fig. 8) [122, 123]. One of the electron-rich orbitals of the dioxy- 
gen unit is directed to the electron-depleted orbital at C2 of the 2-oxoglutarate 
bound to the iron atom (Fig. 8A), so that the C2 undergoes rehybridization 
from its sp? hybridized planar oxo structure to an sp? hybridized tetrahedral 
transition state and forms a covalent bond with the non-coordinated atom of 
the dioxygen unit (Fig. 8B). This weakens both the C-C bond in the 2-oxoglu- 
tarate and the O-O bond in the dioxygen unit 1122, 123]. Decarboxylation then 
occurs concurrently with cleavage of the O-O bond, and the original C2 of 2- 
oxoglutarate, which has become the C1 of succinate, returns to the sp? hy- 
bridization. Simultaneously, a highly reactive ferryl ion is formed (Fig. 8B), 
which hydroxylates the proline or lysine residue in the peptide substrate in the 
second half-reaction [122, 123]. This second half-reaction renews the enzyme- 
bound Fe?* and concludes the catalytic cycle. 


3.6 
Inhibitors and Inactivators 


The formation of scars and fibrous tissue is part of the normal healing process 
after injury, but in some situations collagen accumulates in excessive amounts, 
leading to fibrosis, which compromises the normal functioning of the affected 
tissue. The central role of collagen in fibrosis has prompted attempts to develop 
drugs that inhibit its accumulation, and the critical function of 4-hydroxypro- 
line in collagen has made collagen prolyl 4-hydroxylase an attractive target for 
antifibrotic therapy. Many compounds that inhibit collagen prolyl 4-hydroxy- 
lase, and in many cases the other two collagen hydroxylases as well, are now 
known, although no such inhibitors are yet in clinical use. 

Collagen hydroxylase inhibitors with respect to their peptide substrates and 
all cosubstrates are now known. Many peptides are competitive inhibitors with 
respect to the peptide substrate, poly (L-proline), for example, being highly ef- 
fective for the type I collagen prolyl 4-hydroxylase [5-7, 17], whereas its in- 
hibitory potency with regard to the vertebrate type II and type III enzymes 
(Table 1) (see above) and the C. elegans and D. melanogaster collagen prolyl 4- 
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hydroxylases is much weaker 143-45, 76, 80, 891. Many bivalent cations are com- 
petitive inhibitors with respect to Fe?* 15, 6, 17], the most potent being Zn** with 
a K; of 1 pmol/l for collagen prolyl 4-hydroxylase and Iesel hydroxylase 1137, 
138]. Superoxide dismutase-active copper chelates, including Cu(acetylsalicy- 
late), and Cu(lysine),, are competitive inhibitors with respect to O,, the K; of 
collagen prolyl 4-hydroxylase and lysyl hydroxylase for both compounds being 
30 pmol/l [139]. 

A number of aliphatic and aromatic structural analogues of 2-oxoglutarate 
are competitive inhibitors with respect to this cosubstrate (Fig. 9). The most 
potent of these, pyridine 2,4-dicarboxylate and pyridine 2,5-dicarboxylate, 
have functional groups that can interact at all three subsites of the 2-oxoglu- 
tarate binding site, their K; values for collagen prolyl 4-hydroxylase being 
2 pmol/l and 0.8 pmol/l, respectively 1136, 140, 141]. In the cases of prolyl 
3-hydroxylase and lysyl hydroxylase, pyridine 2,4-dicarboxylate, with Kj 
values of 3 pmol/l and 50 pmol/l, respectively, is a more potent inhibitor than 
pyridine 2,5-dicarboxylate [141]. Lysyl hydroxylase has higher K; values for 
almost all 2-oxoglutarate analogues than the two prolyl hydroxylases [141], 
this being in accordance with the higher Kj, of lysyl hydroxylase for 2-oxo- 
glutarate (Table 2). These and other data (see previous section) indicate that 
the three collagen hydroxylases have similar but not identical 2-oxoglutarate 
binding sites. Systematic variation of the structures of aromatic 2-oxoglutarate 
analogues has indicated that the K; increases markedly if the iron chelating 
moiety is destroyed or the carboxyl group that becomes bound at subsite I (see 
previous section) is omitted or shifted to a position in which it cannot become 
bound at this subsite [140]. Several 2-oxoglutarate analogues also inhibit the 
other class of prolyl 4-hydroxylases, HIF prolyl 4-hydroxylases, but their 
inhibitory properties differ distinctly from those of the collagen prolyl 4-hy- 
droxylases [30]. It should thus be possible to develop potent small molecule 
inhibitors that show a high degree of specificity with respect to the two classes 
of prolyl 4-hydroxylases, enabling selective targeting of fibrotic or ischaemic 
diseases. 

N-Oxalylglycine (Fig. 9) is also a potent collagen prolyl 4-hydroxylase in- 
hibitor with a K; of about 0.5-8 pmol/l [142, 143]. It differs from 2-oxoglutarate 
only by replacement of the methylene group at C3 with -NH-, and is a com- 
petitive inhibitor with respect to 2-oxoglutarate, but cannot replace it as a co- 
substrate [142, 143]. This is consistent with the reaction mechanism (see pre- 
vious section), in which decarboxylation involves a nucleophilic attack by an 
electron pair in the dioxygen unit on the electron-deficient orbital at C2 of 2- 
oxoglutarate (Fig. 8). The C3 in 2-oxoglutarate is sp? hybridized and cannot 
contribute any electron density to the p, orbital of C2 [122, 123, 143]. In oxa- 
lylglycine the C3 is replaced by the sp? hybridized nitrogen atom, the p, orbital 
of which has an electron pair that can contribute to the electron density of C2, 
and therefore a nucleophilic attack is no longer possible [143]. Oxalylalanine, 
which differs from oxalylglycine by carrying a methyl group at the carbon atom 
that corresponds to the C4 of 2-oxoglutarate, is also an effective collagen pro- 
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Fig.9 Structures of some inhibitors of collagen hydroxylases. (2-OG) 2-oxoglutarate, (A) 
pyridine 2,4-dicarboxylate, (B) pyridine 2,5-dicarboxylate, (C) N-oxalylglycine, (D) 3,4-di- 
hydroxybenzoate, (E) L-mimosine, (F) coumalic acid and (G) doxorubicin. Modified from 
[5] with permission from Elsevier 


İyl 4-hydroxylase inhibitor, with a K; of 40 pmol/l, while the other oxalyl amino 
acid derivatives show little inhibitory activity [142, 143]. 
3,4-Dihydroxybenzoate (Fig. 9) also possesses the functional groups re- 
quired for interaction at all three subsites of the 2-oxoglutarate binding site and 
has a K; of about 5 pmol/l for collagen prolyl 4-hydroxylase [136]. This com- 
pound and its derivatives differ from the pyridine derivatives in that they are 
competitive with respect to both 2-oxoglutarate and ascorbate, whereas the lat- 
ter are uncompetitive with respect to ascorbate [136]. This conforms to the cur- 
rent stereochemical model for the catalytic site, according to which the ascor- 
bate binding site is partially identical to the 2-oxoglutarate binding site [122, 
123]. The dihydroxybenzoate and pyridine inhibitors probably become bound 
to different enzyme forms, as determined by the oxidation state of the iron 
atom at the catalytic site [136]. 
L-Mimosine (Fig. 9) inhibits collagen prolyl 4-hydroxylase by 5096 at a 
120 pmol/l concentration in addition to causing reversible, dose-dependent in- 
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hibition of DNA synthesis due to inhibition of the hydroxylation of deoxyhy- 
pusine to hypusine, a rare amino acid required by the eukaryotic initiation fac- 
tor eIF-5A and for cell cycle transition [144]. This compound thus inhibits both 
collagen prolyl 4-hydroxylase and deoxyhypusyl hydroxylase, leading to both 
fibrosuppressive and antiproliferative effects [144]. 

Many of the known collagen hydroxylase inhibitors have low membrane 
permeability and are therefore ineffective in cultured cells and in vivo. The gen- 
eration of lipophilic proinhibitor derivatives of these compounds has been used 
to overcome this problem, however. The proinhibitors do not themselves act as 
inhibitors of the pure enzymes, but readily pass through cell membranes and 
become processed to the active inhibitor only within the cell. Such proin- 
hibitors include ethylpyridine 2,4-dicarboxylate [145], pyridine 2,4-dicar- 
boxylic acid di(methoxyethyl)amide (HOE 077) [146], dimethyloxalylglycine 
[143] and ethyl 3,4-dihydroxybenzoate [147, 148]. 

Three groups of compounds have also been identified that act as irre- 
versible inactivators of collagen prolyl 4-hydroxylase, probably by a suicide 
mechanism. The first group consists of peptides in which the proline residue 
to be hydroxylated is replaced by 5-oxaproline, a proline analogue containing 
oxygen as part of the five-membered ring [149]. The most potent of these 
peptides studied has the structure benzyloxycarbonyl-Phe-Oxaproline-Gly- 
benzylester and inactivates collagen prolyl 4-hydroxylase by 5096 in 1 h at 
0.8 pmol/l concentration [149]. Oxaproline-containing peptides also inactivate 
the enzyme in cultured human skin fibroblasts, although a similar degree of 
inactivation is obtained only at a concentration one order of magnitude higher 
than that required with the pure enzyme [150]. The second group includes 
coumalic acid (Fig. 9), which acts as a 2-oxoglutarate analogue but is only a 
weak inactivator, due to the lack of a functional group needed to become 
bound to the Fe?" at a catalytic site, a 2 mmol/l concentration being required 
for 5096 inactivation in 1 h [151]. The third group consists of the anthra- 
cyclines doxorubicin and daunorubicin (Fig. 9), which inactivate collagen 
prolyl 4-hydroxylase and Iesel hydroxylase by 50% in 1 h at 60 pmol/l and 
150 pmol/l concentrations, respectively [152]. This inactivation can be 
prevented by high concentrations of ascorbate or low concentrations of its 
competitive analogues, whereas 2-oxoglutarate and its competitive analogues 
offer no protection [152]. 

More recently identified inhibitors include the lithospermic acid magnesium 
salt isolated from Salviae miltorrhizae Radix, a Chinese medicinal herb [153], 
the organophosphate insecticides malathion and malaoxon [154], and a hete- 
rocyclic carbonyl-glycine derivative $4682, which inhibits collagen prolyl 4-hy- 
droxylase with a particularly low K; of 155 nmol/l [155]. Minoxidil, an antihy- 
pertensive piperidinopyrimidine nitrooxide is unique among the collagen 
hydroxylase inhibitors in that it does not inhibit the enzyme reaction but 
specifically reduces the mRNA level of lysyl hydroxylase 1, and concurrently 
also the amount of the enzyme, whereas the amount of mRNAs for type I col- 
lagen prolyl 4-hydroxylase are unchanged [156-158]. 
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4 
Collagen Glycosyltransferases 


Collagens contain carbohydrate units, either the monosaccharide galactose or 
the disaccharide glucosylgalactose, linked to hydroxylysine residues [6, 17]. The 
extent of glycosylation of hydroxylysine residues and the ratio of galactosyl- 
hydroxylysine to glucosylgalactosylhydroxylysine vary markedly between col- 
lagen types and within the same collagen type in various physiological and 
pathological states [159]. The functions of the hydroxylysine-linked carbohy- 
drate units are not fully understood, but in the case of the fibril-forming col- 
lagens they influence the lateral packing of the collagen molecules into fibrils 
and the diameters of these fibrils both in vivo and in vitro [160, 161]. 

The formation of the hydroxylysine linked carbohydrate units is catalyzed 
by two specific enzymes, hydroxylysyl galactosyltransferase and galactosyl- 
hydroxylysyl glucosyltransferase (Fig. 10) [159]. The former has been partially 
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Fig. 10 Reactions catalyzed by hydroxylysyl galactosyltransferase (A) and galactosylhy- 
droxylysyl glucosyltransferase (B) 
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purified from various sources, the activity being found in gel filtration in three 
forms with molecular weights of about 450 and 200 kDa and a minor species 
of about 50 kDa [162], while the latter has been purified to homogeneity 
and found to have a molecular weight of 72-78 kDa in SDS-PAGE analysis, 
but a lower molecular weight in gel filtration, probably on account of re- 
tarded elution due to partial adsorption into the column material [163]. 
These two enzymes have not been cloned yet, but interestingly, human lysyl 
hydroxylase 3 and the single C. elegans lysyl hydroxylase have been found 
to possess both collagen galactosyltransferase and glucosyltransferase 
activities [9-11, 164], although their levels in the human lysyl hydroxylase 
3 polypeptide are so low that they may be of little biological significance. 
Furthermore, previous data suggesting that the two glycosyltransferase reac- 
tions may be catalyzed by separate enzymes in vertebrates [31, 162, 163] sup- 
ports the existence of additional collagen glycosyltransferases that are likely 
to be responsible for most of the collagen glycosylation that takes place in vivo. 
The collagen glycosyltransferase activities of lysyl hydroxylase 3 reside in the 
N-terminal domain A, which has no role in the hydroxylase activity of the 
enzyme (Fig. 5) [10]. Site-directed mutagenesis studies have identified Cys144 
and Leu208 in this human polypeptide as important for its glucosyltransferase 
activity and Cys144 and Asp187-191 as being important for galactosyltrans- 
ferase activity, whereas none of these residues plays a role in hydroxylase 
activity [164]. 

Collagen galactosyltransferase catalyzes the transfer of galactose from 
UDP-galactose to hydroxylysine residues, while collagen glucosyltrans- 
ferase catalyzes the transfer of glucose from UDP-glucose to galactosyl- 
hydroxylysine residues (Fig. 10) [159]. The free e-amino group of the hydro- 
xylysine residue is an absolute requirement for both reactions, which are 
also influenced by the amino acid sequence of the peptide, the peptide chain 
length and the peptide conformation, the triple-helical conformation pre- 
venting both glycosylations [159]. Both reactions require a bivalent cation, 
preferably Mn?* [159]. 
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Abstract A large number of extracellular proteins are synthesized in vivo as precursor mol- 
ecules that must be proteolytically processed into their mature functional forms. Included 
among such proteins are a subset of the collagens. A number of the proteinases involved 
in the biosynthetic processing of collagen precursor molecules have fairly recently been 
identified and characterized. It is now evident that the proteinases involved in processing 
collagen precursors are responsible for the biosynthetic processing of various other kinds 
of extracellular proteins as well. The result is coordination of biosynthesis of the various 
processed collagens, and the formation of macromolecular structures formed by some of 
them, with other molecular events in morphogenesis and homeostasis. In this chapter, the 
natures of recently characterized families of proteinases involved in collagen processing are 
summarized, as are recently described ancillary proteins that enhance collagen processing. 
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Also summarized are noncollagenous substrates that have recently been demonstrated to be 
processed by proteinases which also process collagens and whose biosyntheses are therefore 
linked with collagen biosynthesis. 


Keywords Metalloproteinases : ADAMTS - Bone morphogenetic protein-1 - Tolloid - 
Proprotein convertases 
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mTLL Mammalian Tolloid-like 
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PCPE Procollagen C-proteinase enhancer 

pNP Procollagen N-proteinase 

SIBLING Small integrin binding ligand N-linked glycoprotein 
SLRP Small leucine rich proteoglycans 

TGF-B Transforming growth factor B 
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1 

Introduction 


Proteolytic processing is a mechanism for the intracellular and extracellular 
regulation of protein function. Such processing can be catabolic, and act in the 
break down and removal of proteins, or anabolic, and represent important steps 
in the biosynthesis and functional activation of proteins. This chapter will deal 
with what is presently known concerning the anabolic aspects of collagen pro- 
cessing. These include those processing interactions involved in the biosyn- 
thesis and functional maturation of the fibrillar collagen types I-III, V and XI, 
and the nonfibrillar collagen type VII; and those processing interactions that 
appear to be involved in altering functions of the transmembrane collagens 
XIII, XVII and XXV, via shedding from the cell surface. Proteolytic processing 
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of the multiplexin collagen types XV and XVIII to yield anti-angiogenic frag- 
ments such as endostatin will be addressed, although it is not yet clear which 
of the multiple cleavages of these collagens may be involved in generating phys- 
iologically functional fragments, and which are involved in catabolic turnover. 
However, catabolic degradation of collagens by the matrix metalloproteinases 
(MMPs) will not be covered herein, and for this subject the reader is directed 
to several existing reviews [1-3]. 

This chapter will also deal with the natures of various noncollagenous sub- 
strates recently shown to be biosynthetically processed by the same proteinases 
that biosynthetically process the major fibrillar collagens, since processing of 
different substrates by the same proteinases suggests co-regulation of the in 
vivo events in which these different substrates participate. Thus, the nature 
of noncollagenous substrates cleaved by the same proteinases that process 
collagens provides insights into the types of in vivo molecular events with 
which collagen biosynthesis is coordinated. 


2 
Processing of the Major Fibrillar Collagens 


Collagen types I-III constitute the major fibrous components of vertebrate 
extracellular matrix (ECM). All three of these major fibrillar collagens are syn- 
thesized as procollagen precursors that differ from mature collagen monomers 
in that they contain NH,- and COOH-terminal peptide extensions, known as 
N- and C-propeptides, respectively (Fig. 1). These are cleaved, thus releasing the 
central triple helical portion of the molecule which serves as a mature mono- 
mer capable of associating into fibrils [4]. Type I procollagen is a heterotrimer 
comprising two pro-a1(I) chains and one pro-a2(I) chain. Mutations which 
remove the site for cleavage of the N-propeptide portion of the pro-a1(I) or 
pro-a2(I) chains, result in the heritable connective tissue disorders Ehlers-Dan- 
los syndrome (EDS) types VIIA and VIIB, respectively [5,6]. These are marked 
by joint hypermobility and multiple joint dislocations [7]. Deficiency in levels 
of the proteolytic activity responsible for cleaving both the pro-a1(I) and 
pro-a2(I) N-propeptide regions results in EDS type VIIC, marked by extreme 
fragility of the skin [7]. In EDS VIIB, there is little effect on fibril formation 
in dermis, whereas dermal fibrils are irregular in EDS VIIA and are even 
more so in EDS VIIC [7]. Thus, impairment of cleavage of the type I procolla- 
gen N-propeptide is not incompatible with in vivo fibrillogenesis, although 
these defects in N-propeptide processing can lead to abnormal fibril mor- 
phology in tissues. Consistent with what has been observed in vivo, studies with 
in vitro fibrillogenesis systems have shown that collagen monomers which 
retain the N-propeptide are readily incorporated into growing fibrils along 
with normal monomers, although inclusion of the abnormal monomers creates 
fibrils with abnormal morphologies [8].In contrast, type I collagen monomers 
which retain the bulkier C-propeptide are not incorporated into growing fibrils 
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Fig.1 Processing of type I procollagen N- and C-propeptides by ADAMTS-2 and BMP-1, 
respectively 


in vitro [8], and no heritable disorders have yet been associated with inability 
to cleave the C-propeptide. Thus, it may be that inability to cleave major fibril- 
lar procollagen C-propeptides is incompatible with fibrillogenesis,and therefore 
with morphogenesis and viability. The inhibitory effects of uncleaved C-propep- 
tides on fibrillogenesis may be twofold: 1) via interfering with self-association 
of collagen monomers by markedly increasing their solubility; and 2) via steric 
hindrance of the highly ordered packing of monomers necessary for fibrillo- 
genesis [4, 8]. 


2.1 
N-Proteinases and the ADAMTS Family of Metalloproteinases 


Accumulated biochemical evidence has shown both N- and C-propeptides of the 
major fibrillar procollagens to be cleaved by Ca?*-dependent metalloproteinases 
with neutral pH optima and well defined spectra of proteinase inhibitors [9-13]. 
Cleavage of the N-propeptides of procollagen type I and the homotrimeric pro- 
collagen type II is via a procollagen type I N-proteinase (pNPI) activity that 
cleaves at a specific Pro-Gln site in pro-a1(I) chains and a specific Ala-Gln site 
in pro-a2(I) chains, and which has been well characterized in terms of modes 
of inhibition [9, 10, 14, 15]. Interestingly, this activity does not cleave heat de- 
natured type I procollagen [15, 16], indicating a dependence on native confor- 
mation of the substrate. It has been reported that pNPI activity does not 
process the homotrimeric procollagen type III [9, 10, 16], the N-propeptide of 
which has been reported to be cleaved via a procollagen type III N-proteinase 
(pNPIII) activity that does not cleave procollagen type I [17-19]. Biochemical 
isolation of a protein with pNPI activity and partial amino acid sequencing [20] 
led to cloning and characterization of full-length cDNA sequences for both 
bovine and human forms of a pNP1 proteinase [21, 22]. Furthermore, muta- 
tions in the gene that encodes this pNPI were shown to underlie both the EDS 
VIIC phenotype in humans and the analogous bovine disease dermatosparaxis 
[22]. Sequence comparisons showed the pNP1 to belong to the ADAMTS (ADis- 
integrin And Metalloproteinase with ThromboSpondin motifs) family of met- 
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alloproteinases [23], of which there are presently 19 reported vertebrate family 
members [24, 25]. Since pNP1 was only the second ADAMTS proteinase to be 
identified, it has been designated ADAMTS-2. The ADAMTS proteinases share a 
common domain structure (Fig. 2) and resemble the ADAM (ADisintegrin And 
Metalloproteinase) family of proteinases in having pro-, adamalysin/reprolysin- 
like metalloprotease, disintegrin-like and cysteine-rich domains [23, 26]. How- 
ever, they differ from many ADAM proteinases in lacking transmembrane 
domains and, instead, they possess variable numbers of thrombospondin type 
I-like repeats [23, 26] which, at least in some ADAMTS proteinases, appear to 
be involved in binding to ECM components [27]. Evidence indicates that the 
ADAMTS proteinases play a broad spectrum of roles in development, repro- 
duction, disease and homeostasis. For example, ADAMTS-1, -4 and -5/11, 
which form a subset of ADAMTS proteinases based on degree of sequence 
homology and similarities in protein domain structure, cleave the proteogly- 
can aggrecan, and this aggrecanase activity is important both to the home- 
ostasis of cartilage and to etiology of the arthritides [28-30]. ADAMTS-1 was 
originally identified as an inflammation-induced gene product associated with 
cachexia [23], while the phenotype of Adamts1-null mice also suggests roles for 
ADAMTS-1 in growth, organogenesis and female fertility [31]. ADAMTS-1 and 
ADAMTS-8 have both been shown to have anti-angiogenic activity [32], which 
in the case of ADAMTS-1 may involve binding and sequestration of the angio- 
genic factor VEGF;& by ADAMTS-1 COOH-terminal thrombospondin-like 
domains [33]. Roles for ADAMTS-like proteinases in fertility and organogen- 
esis in a broad range of species are suggested by the finding that the gon-1 gene, 
which encodes an ADAMTS-like product, is essential for gonadal morpho- 
genesis in Caenorhabditis elegans [34]. Mutations in the gene for ADAMTS-13 
are causal for thrombotic thrombocytopenic purpura, perhaps due to a demon- 
strated ability to process von Willebrand factor, suggesting ADAMTS-13 to 
play an important role in human vascular homeostasis [35]. 

Aside from the role of ADAMTS-2/pNPI in the biosynthetic processing 
of procollagens, the phenotype of Adamts2-null mice suggests a role in male 
fertility as well, since male knockout mice were sterile, with an apparent defect 
in the maturation of spermatogonia [36]. Interestingly, ADAMTS-2 is expressed 
at dramatically higher levels in seven days post-conception (dpc) murine gas- 
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Fig. 2 Shared protein domain structure of ADAMTS-2, 3, and 14. Protein domains are: 
signal peptide (S): Pro-, Protease, disintegrin-like (Dis), 1st, 2nd, 3rd, and 4th thrombo- 
spondin type-1 (Tsp1, 2, 3, and 4), cysteine-rich (Cys), and spacer domains. The dashed re- 
gion within the protease domain represents the Zn?*-binding active site. The dashed region 
within the COOH-terminal domain indicates the relatively conserved PLAC region 
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trulas than at later gestational times [37], even though mRNAs of the major 
fibrillar collagens are at relatively low levels at 7 dpc and are found at abundant 
levels only markedly later in development [38]. This difference in temporal pat- 
terns of expression suggests that, in addition to its role as pNPI, ADAMTS-2 
is likely to be involved in proteolytic processing of substrates other than 
the major fibrillar procollagens. Importantly, it has recently been shown that 
ADAMTS-2 can cleave procollagen type III, in addition to being able to cleave 
procollagen types I and II [39]. Thus, ADAMTS-2 is both a pNPI and a pNPIII, 
and the previous distinction made between pNPI and pNPIII as activities of 
separate proteinases appears to have been a false one. In addition, ADAMTS-3 
and -14, which have greater similarities in sequence and domain structure to 
ADAMTS-2 than do other ADAMTS proteinases, have been shown to have pNPI 
activity in vitro, and have been suggested as possible sources for the residual 
pNPI activity observable in the bone, tendon, cartilage, skin, and other tissues 
of EDS VIIC patients, dermatosparaxic cattle, and Adamts2-null mice [37, 40]. 
It has yet to be determined whether ADAMTS-3 and -14 have both pNPI and 
pNPIII activities, although in light of the results with ADAMTS-2 [39] and the 
similar structures of ADAMTS-2, -3, and -14, this seems likely. Thus, ADAMTS- 
2, -3, and -14 constitute an ADAMTS subfamily that appears to serve as the 
major, if not sole source of pNP activity in vivo. 


2.2 
C-Proteinases and the BMP-1/Tolloid Family of Metalloproteinases 


If inability to cleave the C-propeptides of fibrillar procollagens is indeed in- 
compatible with fibrillogenesis, then the enzymatic activity responsible for such 
cleavage could be considered a key control point in morphogenetic processes. 
The proteinase(s) responsible for this activity might also be considered a rea- 
sonable target for therapeutic interventions in the pathological over-deposition 
of collagenous ECM, such as occurs in the fibroses. Small amounts of procolla- 
gen C-proteinase (pCP), the activity that cleaves the C-propeptides of procolla- 
gen types I-III, have been purified from the conditioned media of chick embryo 
tendon organ cultures [11] and cultured mouse fibroblasts [12,41]. Isolation of 
sufficient amounts of pCP for NH,-terminal sequencing of proteolytic frag- 
ments showed pCP to be identical to the previously cloned gene product bone 
morphogenetic protein-1 (BMP-1) [13, 42]. Prior to the identification of BMP-1 
as a DCH peptides derived from BMP-1 had been identified in demineralized 
bone extracts capable of inducing ectopic endochondral bone formation when 
implanted into the soft tissues of experimental animals [43]. Extensive purifi- 
cation of this bone morphogenetic activity from demineralized bone extracts 
found the activity to correspond to fractions containing peptides derived 
from BMP-1 and from certain members of the transforming growth factor-B 
(TGF-) family of proteins, which were designated BMPs-2 through -7 [43-45]. 
BMP-1 was unlike the TGF-B-related proteins, and had a distinct protein do- 
main structure that included a conserved domain found in the astacin family 
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of metalloproteinases [46]. In addition, BMP-1 contains an NH,-terminal 
prodomain, that must be proteolytically removed for activation to occur [46, 47]; 
CUB (Complement-Uegf-BMP-1) domains, first characterized in complement 
components Cl1r/C1s [48] and thought to mediate protein-protein interactions 
in various proteins implicated in developmental processes [49]; and an EGF- 
like motif, that may also be involved in protein-protein interactions [46], but 
which in some proteins binds Ca?* [50]. Experiments with a series of recom- 
binant truncated forms of BMP-1 have shown that BMP-1 lacking both the EGF 
motif and the most COOH-terminal CUB domain retains much of its pCP 
activity [51]. Thus, the functions of the more COOH terminal domains, includ- 
ing additional COOH-terminal domains found on other pCPs (see below), re- 
main to be determined. Such functions may include interactions with substrates 
other than the major procollagens (see below) and/or targeting to specific areas 
of the extracellular compartment via binding to specific ECM components. 
Although the domain structure of BMP-1 was unique at the time of discov- 
ery, BMP-1 has since become the prototype of a family of similarly structured 
metalloproteinases (Fig. 3) implicated in morphogenetic processes in a broad 
range of species. Soon after discovery of BMP- 1, the Drosophila protein Tolloid 
(TLD) was found to have a domain structure highly similar to that of BMP-1 
[52] (Fig. 3). Tolloid, the product of one of at least seven zygotically active genes 
necessary to formation of the dorsal-ventral axis in early Drosophila embryo- 
genesis, acts by potentiating the activity of the TGF-D-like protein Decapenta- 
plegic (DPP) [53, 54]. DPP is more similar in sequence to BMP-2 and BMP-4 
than to any other vertebrate TGF-B-like molecule [55]. Moreover, these proteins 
are functional orthologues, as DPP and BMP-2/-4 are capable of functionally 
substituting for each other and of subserving the same roles in invertebrate 
and vertebrate dorsal-ventral patterning [55]. Co-purification of BMP-1 with 
BMP-2/-4 through a large number of chromatographic steps, suggested molec- 
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Fig.3 Domain structures of BMP-1-like proteinases: Black denotes signal peptides; dark 
gray, prodomains; light gray, protease domains; white, CUB domains; spotted, EGF-like 
domains; hatched, domains unique to each protein. Chevrons denote alternative splicing 
events that produce BMP-1 and mTLD from the same gene 
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ular interactions between these molecules, although the nature of such interac- 
tions were initially obscure, since proteolytic activation of TGF-D-like proteins 
is via cleavage at consensus Arg-X-X-Arg sites by furin-like proprotein conver- 
tases [56]. Nevertheless, the above observations suggested functional interaction 
between BMP-1/TLD-like proteinases and DPP/BMP-2/-4 growth factors across 
a wide range of species. It is now known that TLD acts in dorsal-ventral pat- 
terning by cleaving the secreted protein Short gastrulation (SOG) [56], which 
releases DPP from a latent DPP-SOG complex. Similarly, the BMP-1/TLD-related 
non-mammalian vertebrate proteinases Xenopus Xolloid [57] and zebrafish 
Tolloid [58] (Fig. 3) are capable of in vitro cleavage of the SOG orthologue 
Xenopus Chordin, and of counteracting the dorsalizing effects of Chordin upon 
co-overexpression in Xenopus [57] or zebrafish [58] embryos. 

Mammals have four BMP-1/TLD-related proteinases. The gene that encodes 
BMP-1, also encodes alternatively spliced mRNA for a second, longer proteinase 
which, with an additional EGF-like domain and 2 additional COOH-terminal 
CUB domains, has a domain structure essentially identical to Drosophila TLD 
and which has therefore been designated mammalian Tolloid (mTLD) [59]. In 
addition, there are two genetically distinct proteinases, designated mammalian 
Tolloid-like 1 [60] and 2 [61] (mTLL-1 and mTLL-2), with domain structures 
identical to that of mTLD (Fig. 3). BMP-1 and mTLL-1, but not mTLD or 
mTLL-2, are capable of cleaving mammalian Chordin in vitro and of counter- 
acting the dorsalizing effects of Chordin upon co-overexpression in Xenopus 
[61].In addition, although full-length Chordin is not readily detectable in con- 
ditioned media of mouse embryo fibroblast (MEF) cultures derived from wild 
type mice or from knockout mice lacking either the Bmp1 gene, which encodes 
both BMP-1 and mTLD, or the Tİl1 gene, which encodes mTLL-1, full-length 
Chordin is readily detectable in media of MEFs derived from embryos doubly 
homozygous null for both the Bmp1 and TII1 genes [62]. Together the various 
data thus demonstrate that BMP-1 and mTLL-1 are together responsible for 
cleaving Chordin in vivo and that together these two proteinases are therefore 
involved in potentiating signalling by BMP-2 and -4 in mammalian morpho- 
genetic and homeostatic events. 

All four mammalian proteinases have been shown to have pCP activity in 
vitro [61, 62], with BMP-1 and mTLL-2 demonstrating the most robust and 
lowest levels of activity, respectively 161, 62]. Moreover, levels of pCP activity are 
reduced in the culture media of Bmp1-null MEFs [63], whereas pCP activity is 
undetectable in media of MEFs derived from embryos doubly homozygous null 
for the Bmp1 and TIl genes [62]. Thus BMP-1, mTLD and mTLL-1 are all likely 
to be physiologically relevant pCPs in mammalian tissues. Absence of de- 
tectable pCP activity in Bmp1:TIl1 doubly null MEF media [60] argues against 
a major role for mTLL-2 as an in vivo pCP, especially since mTLL-2 mRNA is 
expressed by MEFs [64]. However, small but detectable amounts of apparently 
mature type I collagen monomers found in ECM associated with Bmp1:Tİl1 
doubly null MEF cell layers [62] suggest that some residual pCP activity may 
remain, perhaps provided by mTLL-2. It is also possible that if mTLL-2 does not 
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play a major role as a pCP in MEF cultures, it may still play significant roles in 
other cells and tissues. 

The above data indicate that at least some of the proteinases that act as pCPs 
in vivo also serve to modulate the activities of the important morphogens 
BMP-2 and -4, in vivo. Thus, these proteinases would appear to provide some 
degree of coordination between formation of the collagenous ECM and BMP 
signaling in mammalian morphogenetic events. 


2.3 
Major Procollagen Processing in the Context of Secretion and Fibrillogenesis 


Of some interest are the questions of where proteolytic processing of the major 
procollagens occurs in regard to the collagen producing cell, and when it occurs 
relative to other steps in the process of fibrillogenesis. BMP-1-like proteinases, 
ADAMTS-2, and the major fibrillar procollagens are all secreted proteins [4, 39, 
65]. Moreover, the neutral pH optima of both pNPs and pCPs [9-12] suggest 
that these proteinases operate most efficiently in the extracellular space, given 
the acidic milieu of intracellular compartments. Nevertheless, it has been 
reported that BMP-1 may, in at least some circumstances, be activated within 
the trans-Golgi compartment, concomitant with the process of sialylation [47]. 
It has also been noted that if, as some data suggest, BMP-1-like proteinases are 
responsible for cleaving the prodomain of the small proteoglycan decorin 
(see below), then such cleavage way occur within the Golgi apparatus prior to 
elongation of decorin glycosaminoglycan chains [64]. Thus, although many 
studies have demonstrated that both pCP activity and unprocessed procollagen 
are secreted into the extracellular space in cell or organ culture, or demonstrated 
C-propeptide cleavage in cell-free systems, the possibility exists that some cleav- 
age of procollagen may occur intracellularly in the trans-Golgi compartment. 
Classical electron microscopy studies have suggested that fibrillar procolla- 
gens form intracellular aggregates, rather than being secreted in monomeric 
form, and that nascent fibrils may grow within recesses of the fibroblast cell 
surface via addition of these intracellular aggregates [66, 67]. This view has re- 
ceived support in recent years by studies demonstrating that procollagen forms 
large electron-dense aggregates in the Golgi, that are too large for secretion via 
transport vesicles and which are instead transported across the Golgi complex 
via progressive maturation of the Golgi cisternae [68]. Interestingly, it has been 
reported that both pCP and pNP activities cleave aggregated procollagen at 
markedly more rapid rates than they do monomeric procollagen [69]. Thus, it 
seems likely that major fibrillar procollagens are cleaved as macromolecular 
aggregates by BMP-1-like proteinases, and it may be that such cleavage occurs 
either immediately before, or coincident with secretion. The distinction between 
intracellular and extracellular may be a fine one in this case, since cleavage may 
occur within fully matured Golgi cisternae as they open into recesses of the 
fibroblast cell surface, resulting in a pH shift towards neutral in the open 
cisternae. Certainly, the above speculations are consistent with studies that have 
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shown the processing of major procollagen C-propeptides to be extremely 
rapid in tissues, and likely coincident with secretion in a pericellular environ- 
ment [70, 71]. In contrast to BMP-1-like proteinases, fully activated, mature 
ADAMTS-2 is not detectable intracellularly, and only appears extracellularly, 
seemingly bound to extracellular heparan sulfate proteoglycans and perhaps 
to various components of the ECM [20, 39]. Thus, pNP activities may process 
procollagen aggregates in a truly extracellular fashion. 


2.4 
C-Proteinase Enhancer Proteins 


Initial characterization of pCP activity from mammalian sources found this 
activity to be enhanced by a 55-kDa glycoprotein, designated the procollagen 
C-proteinase enhancer (PCPE), and by 36- and 34-kDa proteolytic fragments 
of PCPE [41, 72]. PCPE contains two NH,-terminal CUB domains and a COOH- 
terminal NTR (netrin) domain [73]. The latter has homology with the COOH- 
terminal domains of netrins, which are involved in axon guidance; with the 
NH,-terminal domains of TIMPs (tissue inhibitors of matrix metallopro- 
teinases); with complement components C3-5; and with the frizzled-related 
proteins, which are extracellular antagonists of the Wnt family of extracellular 
ligands [74]. The 36- and 34-kDa PCPE fragments, which contain little or no 
sequences other than the two CUB domains [73], retain full pCP-enhancing 
activity and, like the full-length 55-kDa form, bind type I procollagen C-propep- 
tides [41,72]. Thus, such abilities appear to reside entirely in the CUB motifs. In- 
terestingly, sequence homologies between PCPE and BMP-1 CUB domains were 
one of the attributes that first suggested BMP-1 as a candidate pCP [73], and 
these homologies suggest that BMP-1, which like PCPE binds procollagen 
C-propeptides [13], may do so via its CUB domains. In contrast to activities 
provided by the PCPE CUB domains, the cleaved COOH-terminal NTR domain, 
consistent with its homology to TIMPs, has the ability to inhibit MMPs [75]. 
Thus, PCPE may play a dual role in collagen deposition, by enhancing pCP 
activity via its CUB domains and by preventing collagen degradation, via MMP 
inhibition by its COOH-terminal domain. It should be noted, however, that the 
cleaved PCPE NTR domain inhibits MMP-2 (a gelatinase) with an ICs, value of 
560 nmol/l, compared to an ICs, value of 1.6 nmol/l for TIMP-2, suggesting that 
a metalloproteinase(s) other than the MMPs, or at least other than MMP-2, may 
be the major target of this fragment [75]. It is interesting to speculate that what- 
ever functions are provided by full-length PCPE, it may serve as a precursor 
from which functional NH;- and COOH-terminal products are derived. PCPE 
may also serve additional roles, since disruption of the rat PCPE gene can result 
in anchorage-independent growth and loss of contact inhibition in cultured 
fibroblasts [76], although a mechanism for these latter observations is not clear. 

PCPE has no intrinsic pCP activity but, rather, appears to act by binding 
procollagen with a 1:1 stoichiometry and, in so doing, facilitating procollagen 
C-propeptide cleavage, perhaps via inducing conformational changes in the 
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substrate [72, 77]. The enhancing activity of PCPE seems specific to pCP ac- 
tivity and does not extend to pNP activity, or to cleavage of other substrates of 
BMP-1/Tolloid-related proteinases, such as Chordin or type V procollagen (BM 
Steiglitz and DS Greenspan, unpublished data). Recently, a second PCPE (PCPE2) 
has been described, with a protein domain structure and levels of pCP-en- 
hancing activity similar to those of PCPE (redesignated PCPE1) [78, 79]. The 
two PCPEs differ widely in their distributions of expression: PCPE1 has a broad 
distribution of expression throughout developing mesenchymal tissues, with 
especially high levels in ossifying bone [73, 79], while developmental expression 
of PCPE2 seems primarily localized to non-ossified cartilage [79]. Nevertheless, 
although PCPEI is predominantly expressed in tissues rich in type I collagen 
and PCPE2 is predominantly expressed in cartilage, in which type II collagen 
is the major fibrillar collagen, the two PCPEs are equally efficient in enhancing 
cleavage of procollagen I and II C-propeptides [79]. Surprisingly, both PCPE1 
and 2 were found capable of binding collagen fibrils in tissues and to collagen 
molecules devoid of C-propeptides in vitro [79]. The same studies also showed 
pCPs such as mTLL-1 to be capable of binding the triple helical portions of 
collagen monomers at the same types of sites used by PCPEI and 2. Previous 
studies had shown that PCPE] increases the maximal velocity (V max) of the pCP 
reaction, but that it also decreases the apparent K,, [72]. The observation that, 
at the relatively high ratios of PCPEs and pCPs to procollagen used in in vitro 
assays, PCPEs and pCPs compete for similar sites on the collagen triple helix 
[72] can explain how PCPEs lower the apparent Kn of the pCP reaction in such 
assays. Ky values for binding of PCPEI to procollagen I and III C-propeptides 
have been estimated at ~170 and ~370 nmol/l, respectively [77], while Kp values 
for PCPE binding to the collagen triple helix [68] and to type I procollagen [77] 
have been estimated at ~10 and ~1 nmol/l, respectively. Thus, although PCPE1 
was first isolated, in part, via its ability to bind the type I procollagen C-propep- 
tide [72], the majority of procollagen-binding activity may derive from binding 
of PCPEs to the collagen triple helical domain. The apparent binding of both 
PCPEs and pCPs to a number of low affinity binding sites on the collagen triple 
helix may act to increase local concentrations of the factors required for biosyn- 
thetic processing by limiting their diffusion away from procollagen substrate, 
thus facilitating pCP activity at the C-propeptide cleavage site. 


2.5 
Noncollagenous Substrates of the BMP-1/Tolloid-Related C-Proteinases 


The sites at which BMP-1-related proteinases cleave procollagens I-III and 
Chordin have certain similarities. Most notably, each site contains an Asp residue 
at position P1’ and a residue with an aromatic side chain (i.e. Tyr or Phe) and/or 
Met residues NH;-terminal to the cleavage site, usually in position P3 or P2 
(Fig. 4). Numerous extracellular proteins are synthesized as precursors with 
prodomains that are proteolytically cleaved during biosynthesis, to yield the ma- 
ture functional protein. A subset of these precursors have in vivo cleavage sites 
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Procollagen C-propeptides 


«1 (I) DGGRYYRA...DDANVVRD 
a2 (T) YDGDFYRA...DQPRSAPS 
a1(IT) DPLQYMRA...DQAAGGLR 
OGl(ITIİ1) GGFAPYYG...DEPMDFKI 
a2 (V) DPLPEFTE...DQAAPDDK 
&1l (VII) RPLPSYAA...DTAGSQLH 
Laminin 5 Y2 chain DTGDCYSG...DENPDIEC 
Lysyl oxidase SHVDRMVG...DDPYNPYK 
SLRP”s 

Biglycan DDGPFMMN...DEEASGAD 
Decorin GLFDFMLE...DEASGIGP 
Osteoglycin EKSLQLQK...DEVIPSLP 
Epiphycan TELFNYDS...EVYDAILE 
Chordin 

N-terminal site DPEHRSYS...DRGEPGVG 
C-terminal site KLGDPMQA...DGPRGCRF 
SIBLING”s 

DMP1 FDDEGMQS...DDPESTRS 
DSPP 

Myostatin IDQYDVQR...DDSSDGSL 
Procollagen N-propeptides 

«1 (V) AVPDTPQS...QDPNPDEY 
a1 (XI) SAPKAAQA...QEPOIDEY 
a2 (XI) GORERPQN...QOPHRAOR 


Fig.4 Known and potential cleavage sites of BMP-1-like proteinases. Residues in boldface 
are conserved at the cleavage sites of either procollagen N-propeptides, or other substrates 


that resemble the cleavage sites at which mammalian BMP-1-like proteinases 
cleave procollagens I-III and Chordin. Such precursors have become candi- 
dates, and a number have been demonstrated to be, substrates of BMP-1-related 
proteinases. Known noncollagenous substrates are presented below. 


2.5.1 
Small Leucine-Rich Proteoglycans 


Biglycan and decorin are small nonaggregating proteoglycans that contain 
chondroitin sulfate or dermatan sulfate side chains and belong to the family of 
small leucine-rich proteoglyans (SLRPs) of the ECM [80]. The phenotypes of 
mice homozygous null for biglycan or decorin have shown the former to be 
a positive regulator of bone growth [81] and the latter a regulator of type I 
collagen fibrillogenesis in skin and tendon [82]. High levels of expression in 
preosteogenic cells and a pericellular distribution are consistent with a role for 
biglycan in osteoblast differentiation, whereas association of decorin expres- 
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sion with tissues rich in fibrillar collagens is consistent with a role in fibrillo- 
genesis [83]. Although the molecular bases for the biological roles of biglycan 
and decorin are unclear, they may involve demonstrated abilities of the two 
to interact with various collagens, other ECM proteins, and TGF-B [84-90]. 
Biglycan and decorin are synthesized as pro-forms containing NH,-propeptides 
of 21 and 14 residues, respectively, that are completely removed in most, but not 
all connective tissues [91]. Residues at the human probiglycan and prodecorin 
in vivo cleavage sites, M(M/L)N-DEE and M(L/I)E-DE(A/G), are conserved in 
various species [91-96] and show similarities to the procollagen I-III C-propep- 
tide cleavage sites (Fig. 4), which made these proteins candidate substrates for 
the mammalian BMP-1-related proteinases. Studies have since shown that the 
pro-form of biglycan is efficiently cleaved in vitro by BMP-1 at the physiologi- 
cally relevant maturation site and not at any other site [64]. Related proteinases 
mTLD and mTLL-1 have lower levels of the same activity, whereas mTLL-2 lacks 
detectable levels of such activity [64]. Moreover, although only mature biglycan, 
is detectable in wild type MEF conditioned media, a mixture of ~40% mature 
biglycan and 60% probiglycan is detectable in the media of Bmp1-/-MEFs, and 
only uncleaved probiglycan is detectable in culture media of MEFs doubly null 
for both the Bmp1 and TU) genes [64]. Thus, products of the Bmpl and TIl1 genes 
are together responsible for all detectable probiglycan-processing activity in vivo, 
or at least in MEFs. Clearly, biosynthetic processing of biglycan, a positive regu- 
lator of bone growth, conforms with the roles of BMP-1, mTLD, and mTLL-1 as 
pCPs, an activity necessary to formation of the collagenous ECM of bone; and as 
activators of signaling by TGF-B-like BMPs, first characterized as inducers of 
bone formation. Such observations suggest BMP-1-like proteinases to be central 
in orchestrating events involved in bone formation. 

Although biosynthetic processing of decorin, a modulator of type I collagen 
fibrillogenesis, would also conform to the known roles of the BMP-1-related pro- 
teinases, it has yet to be determined which proteinase(s) is responsible for 
decorin processing. It must be noted, however that if prodecorin and probigly- 
can are processed by the same proteinases, then it is with different kinetics, since 
transfected 293-EBNA cells that secrete only unprocessed recombinant pro- 
biglycan, secrete only fully processed mature recombinant decorin [64]. 

The SLRPs can be divided into four classes, based on sequence homologies and 
protein domain structures [80, 97]. Class I comprises decorin and biglycan, which 
show higher sequence homology to each other than is found in any other pair of 
SLRPs; class II comprises fibromodulin, lumican, keratocan, PRELP, and osteo- 
adherin; class III consists of chondroadherin; while class IV comprises epiphy- 
can and osteoglycin. While there is no evidence that class II or III SLRPs are 
processed from precursors to mature forms, in vivo biosynthetic processing has 
been reported for the two class IV SLRPs, osteoglycin [98] and epiphycan [99]. 
Very recent data show that osteoglycin has a prodomain that is cleaved, with vary- 
ing efficiencies, by all four mammalian BMP-1-related proteinases in vitro and 
that this prodomain is processed by wild type MEFs, but not by MEFs doubly 
null for the Bmpl and TII1 genes [100]. Thus, at least three of the mammalian 
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BMP-1-related proteinases are involved in the in vivo biosynthetic processing of 
osteoglycin. This activity appears to conform with other activities of the BMP-1- 
like proteinases, since osteoglycin appears to play a role in regulating collagen 
fibrillogenesis [101]. Future studies are needed to determine whether precursors 
of other SLRPs, such as epiphycan are processed by BMP-1-related proteinases. 


2.5.2 
Lysyl Oxidase 


Lysyl oxidase is a secreted amine oxidase necessary for formation of the covalent 
cross links that give collagen and elastic fibers much of their tensile strength. It 
does this by oxidatively deaminating e-amino groups of certain peptidyl lysine 
and hydroxylysine residues within collagen monomers and peptidyl lysines 
in elastin precursors. This results in conversion of the lysines to peptidyl-a- 
aminoadipic-6-semialdehydes and the hydroxylysines to peptidyl-6-hydroxy- 
a-aminoadipic-ó-semialdehydes, with subsequent spontaneous condensation 
of these peptidyl aldehydes with other vicinal peptidyl aldehydes or with un- 
reacted e-amino groups to form intra- and intermolecular covalent cross-links 
[102]. Lysyl oxidase is synthesized as a zymogen or proenzyme, with a pro- 
domain that must be cleaved to produce the mature active form [103]. Because 
the site for cleavage contains a P1’ Asp (Fig. 4), pCP activity from chick embryo 
tendon organ culture was examined for its ability to activate lysyl oxidase [104]. 
It was found to cleave the proenzyme to produce mature active lysyl oxidase 
with an NH,-terminus identical to that of mature lysyl oxidase cleaved in me- 
dia of arterial smooth muscle cultures [104]. Subsequently it has been shown 
that all four mammalian BMP-1-related proteinases are capable of cleaving pro- 
lysyl oxidase in vitro, and solely at the physiologically appropriate site [105]. 
Moreover, in contrast to wild type MEFs, MEFs doubly null for the Bmp1 and 
TIH genes produce predominantly unprocessed prolysyl oxidase and their me- 
dia has lysyl oxidase activity only 30% that of wild type [105]. Thus, Bmp1 and 
TII1 gene products are responsible for the majority of proteolytic activation of 
lysyl oxidase, at least in MEF cultures, illustrating another way in which the 
mammalian BMP-1-related proteinases affect ECM formation. 


2.5.3 
SIBLING Proteins 


Noncollagenous proteins of the SIBLING (Small Integrin-Binding LIgand N- 
linked Glycoprotein) family, which includes dentin sialophosphoprotein (DSPP) 
and dentin matrix protein-1 (DMP1), are secreted and deposited into bone and 
dentin ECM during assembly and mineralization, and are thought to initiate 
mineralization through their acidic calcium binding domains [106-110]. 
Cleaved COOH-terminal domains of DSPP and DMP1 are found in extracts of 
mineralized tissues [111, 112], and are capable of stimulating in vitro hy- 
droxyapatite crystal formation [113, 114], prompting suggestions that in vivo 
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processing of full-length DSPP and DMP1 occurs, yielding bioactive cleavage 
products. The NH;-terminus of the cleaved DSPP COOH-terminal domain 
extracted from dentin, shows in vivo cleavage to have occurred between residues 
Gly*” and Asp** [111], numbering from the initial Met of full-length preproD- 
SPP. Analysis of DMP1-derived cleavage products extracted from bone shows 
in vivo cleavage of DMPI to have occurred at multiple sites, only one of which, 
between Ser!% and Asp’’’, occurs within sequences strictly conserved within 
DMP1 in a broad range of species [112]. Similarities between the described 
cleavage sites of DSPP and DMPI and previously characterized substrates 
(Fig. 4), suggested these two proteins as candidate substrates for BMP-1-related 
proteinases. Experiments have shown recombinant human DMPI to be cleaved 
to varying extents by all four mammalian BMP-1-related proteinases, yielding 
fragments similar in size to those isolated from bone [115]. Moreover, this 
cleavage occurs between Ser'?6 and Asp'?", the most conserved site within DMP1 
sequences from a wide range of species [115]. In addition, processing of DMP1 
is deficient in Bmp1-/-;TII1-/- doubly homozygous null MEF cultures [115], fur- 
ther supporting the possibility of roles for BMP-1-related proteinases in the in 
vivo processing of DMPI. It is yet to be determined whether DSPP is similarly 
processed by BMP-1-related proteinases. Nevertheless, it is conceptually attrac- 
tive that the same proteinases that regulate biosynthesis of type I collagen, the 
major structural protein of bone, and that activate TGF-D-like BMPs responsi- 
ble for osteoblastic differentiation, may also regulate activities of SIBLING pro- 
teins important to the mineralization of bone and other hard tissues. 


2.5.4 
Laminin-5 


Laminin-5, a non-collagenous heterotrimer composed of a3, 83, and y2 chains, 
is a major component of anchoring filaments within the basement membrane 
at the dermal-epidermal junction and is important in stabilizing attachment of 
epithelial hemidesmosomes to basement membranes [116]. Null mutations in 
any of the three genes encoding the laminin-5 chains lead to the lethal blister- 
ing genetic disease Herlitz's junctional epidermolysis bullosa [117]. 

Cultured keratinocytes secrete a laminin-5 precursor with 200 kDa a3 and 
155 kDa y2 chains, that are processed upon secretion to yield 165 and 105 kDa 
forms, respectively, corresponding to the sizes of «3 and y2 chains isolated from 
tissues [116,118]. Exogenously added MMP-2 is capable of cleaving the y2 chain 
of rat laminin-5 [119], albeit at a site not conserved in the human laminin-5 y2 
chain [120]. Additional data have suggested that membrane type 1 matrix met- 
alloprotease (MT1-MMP) may be the proteinase primarily responsible for 
cleavage of the rat laminin-5 y2 chain, and that MMP-2 may play an ancillary 
role [121]. However, only BMP-1-related proteinases have been shown to cleave 
the human y2 chain at the authentic site at which it is cleaved by keratinocytes 
[122]. BMP-1-related proteinases, MMP-2, MT1-MMB and other proteinases are 
capable of cleaving the «3 chain to produce a form indistinguishable in size 
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from the keratinocyte processed form [120, 122, 123]. However, a small mole- 
cule inhibitor, that seems highly specific for the BMP-1-related proteinases, 
inhibits cleavage of a3, as well as y2 chains in keratinoycyte cultures. In addi- 
tion, neither MT1-MMP nor MMP-2 cleave the human laminin-5 y2 chain, 
whereas processing of laminin-5 appears to be deficient in the skin of Bmp1 
null mice [120]. All four BMP-1-related proteinases are capable of cleaving both 
the a3 and y2 chains [120], although two studies have suggested mTLD as the 
major BMP-1-related proteinase secreted by keratinocytes [65, 120]. Thus, con- 
siderable evidence supports roles for BMP-1-related proteinases, and perhaps 
mTLD, in processing laminin-5 in vivo, although, the question of which pro- 
teinase(s) cleave laminin-5 in vivo should still be considered controversial at 
this time. The question is of considerable importance, as the proteolytically 
processed form of laminin-5 is more conducive to epithelial cell migration than 
is the unprocessed form [119], suggesting that targeting the proteinase(s) re- 
sponsible for such processing may be an effective therapeutic approach towards 
limiting invasiveness of carcinomas. 


2.5.5 
Myostatin/GDF-8 


Myostatin, also known as growth differentiation factor 8 (GDF-8), is a TGF-B- 
like protein that is expressed almost exclusively in cells of myogenic lineage 
throughout development and in skeletal muscle in the adult, and it negatively 
regulates skeletal muscle mass [124-127]. Although the myostatin prodomain, 
like the prodomains of all TGF-B-like molecules, is cleaved by furin-like pro- 
protein convertases to yield the mature active molecule, the cleaved myostatin 
prodomain remains non-covalently associated with mature myostatin as a la- 
tent complex 1124-126, 128]. Recently it has been shown that BMP-1, mTLL-1, 
and mTLL-2 cleave within myostatin prodomain sequences, to liberate active 
mature myostatin from the latent complex [129]. Lower levels of activity were 
shown for mTLD [129]. The cleavage site resembles those of a number of other 
substrates of BMP-1-like proteinases, including existence of a DU Asp [129] 
(Fig. 4). Substitution of the myostatin DU Asp with Ala, makes the prodomain 
impervious to cleavage by BMP-1-like proteinases and prevents activation of 
myostatin latent complexes in vitro. Importantly, systemic administration of 
the mutant myostatin prodomain to adult mice induces marked increases in 
muscle mass, whereas wild type prodomain does not have this effect [129]. 
Thus, cleavage at this site is an important activator of myostatin activity in vivo. 
While it is not clear which BMP-1-like proteinase may activate myostatin in 
vivo, it is of interest that mTLL-2 is specifically expressed by skeletal muscle 
during embryonic development, whereas BMP-1 has a broad developmental 
distribution of expression that includes muscle [61 ]. It is intriguing to speculate 
whether such proteinases may operate in disease processes such as muscular dy- 
strophies, via potentiating fibrotic tissue infiltration while negatively regulating 
skeletal muscle regeneration. 
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3 
Processing of the Minor Fibrillar Collagens 


The minor fibrillar collagens comprise low abundance fibrillar collagen types 
V and XI. Monomers of collagens V and XI are incorporated into growing fib- 
rils of the much more abundant major fibrillar collagens I and II, respectively, 
and regulate the sizes and shapes of the resultant heterotypic fibrils [130-136]. 
Type V collagen is most widely distributed in tissues as a heterotrimer of the 
chain composition a1(V),02(V) 11371, but is found in a limited number of cell 
types and tissues as a rare a1(V), homotrimer [138-140]. In addition, a poorly 
characterized al(V)a2(V)a3(V) heterotrimer has been isolated primarily 
from human placenta [141, 142], but has also been reported in uterus, skin, and 
synovial membranes 1137, 143-145]. Localization of «3(V) RNA within nascent 
ligamentous attachments of developing joints, membranous linings of devel- 
oping skeletal muscle, and developing and regenerating peripheral nerves in 
mouse and rat 1146, 147], suggests roles for a1(V)a2(V)a3(V) heterotrimer in 
these tissues as well. Type XI collagen, in the form of an a1(XI)o2(XI)a3(XI) 
heterotrimer [148], was first characterized as a minor collagen of cartilage. 
However, findings of type XI chains in noncartilaginous tissues [150], of type V 
chains in cartilage [150], and of cross-type heterotrimers composed of both 
type V and XI chains [151, 152] suggest that type V and XI chains constitute a 
single collagen type in which different combinations of chains associate in a tis- 
sue-specific manner. Unlike the major fibrillar collagens I-III, processed col- 
lagens V and XI retain residual N-propeptide sequences [139, 140, 153-157]. 
These appear to be of functional importance since, as shown for collagen V, they 
protrude beyond the surface of heterotypic fibrils and may control fibrilloge- 
nesis by sterically hindering further addition of collagen monomers to the fib- 
ril surface [155]. The minor fibrillar procollagens are similar in domain struc- 
ture to the major fibrillar procollagens, in consisting of a central collagenous 
domain of ~1000 amino acid residues in length, bracketed by N- and C-propep- 
tides [137]. However, the pro-a1(V), pro-a1(XI), pro-a2(XI), and pro-a3(V) 
chains, more similar in sequence and domain structure to each other than to 
other fibrillar procollagen chains, differ most from the other procollagen chains 
in the configurations and large sizes of their N-propeptides (Fig. 5) [158-164]. 
The genes for these procollagen chains also share a conserved intron-exon struc- 
ture that is diverged from the conserved intron-exon structure of the major fib- 
rillar procollagen chain genes [161, 162, 165, 166], implying at least two evolu- 
tionary pathways that have led to distinct subclasses of fibrillar collagen chains. 
In contrast, domain structure of the pro-a2(V) chain (Fig. 5) and the structure 
of its cognate gene resemble those of the major fibrillar collagens [167, 168]. 
Low levels of minor fibrillar procollagens in tissues and cell cultures have 
limited their characterization. Thus, to facilitate characterization of type V pro- 
collagen processing, recombinant pro-a1(V)3 homotrimers were produced for 
in vitro cleavage assays [169]. Recombinant pro-a1(V); homotrimers were used 
for initial studies, as their production was more straightforward than produc- 
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Fig.5 Processing of type V procollagen. SP denotes signal peptides. PARP and variable (VAR) 
subdomains of the pro-a1(V) NH,-terminal globular sequences are shown. C-pro denotes 
C-propeptides, and CR denotes the cysteine-rich subdomain of the pro-a2(V) N-propeptide 


tion of recombinant pro-a1(V);pro-a2(V) heterotrimers, although the latter 
are more representative of the majority of type V procollagen in vivo. Surpris- 
ingly, secretion of recombinant pro-a1(V), homotrimers from cultures of 
transfected human 293-EBNA cells was accompanied by cleavage of pro-a1(V); 
C-propeptides at a consensus (R/K)XRR site suitable for cleavage by furin-like 
proprotein convertases [169]. This cleavage was partially inhibited by cultur- 
ing cells in the presence of 100 mmol/l L-arginine, resulting in intact pro-a1(V); 
homotrimers that could indeed be cleaved in vitro by a soluble form of furin, 
and solely at the site used in cell cultures [169]. Incubation of intact pro-a1(V)3 
homotrimers with BMP-1 did not lead to C-propeptide cleavage, but instead led 
to cleavage at a single site within pro-a1(V) NH;-terminal globular sequences 
[169] (Fig. 5). Non-cleavage of the C-propeptide by BMP-1 was surprising, since 
the domain structure of pro-a1(V) C-propeptides is similar to that of the 
major fibrillar procollagen C-propeptides, and because the pro-a1(V) COOH- 
telopeptide, the short linker region that connects the main collagenous domain 
to the C-propetide, contains three Asp residues [158, 159], one of which might 
have marked the P1’ residue of a pCP cleavage site. Also surprising was the 
finding that the BMP-1 cleavage site within pro-a1(V) NH,-terminal globular 
sequences differed from previously characterized cleavage sites of BMP-1-re- 
lated proteinases in possessing Gln rather than Asp in the P1' position, and that 
other residues flanking the scissile bond lacked resemblance to residues flank- 
ing scissile bonds in previously identified substrates of BMP-1-like proteinases 
[169] (Fig. 4). Nevertheless, the &1(V) NH;-terminus produced by cleavage at 
this site corresponds to an equivalent NH;-terminus on the processed ECM 
form of the similar «1(XI) collagen chain [156], consistent with physiological 
significance for cleavage at this site. Conservation of residues surrounding the 
NH,-terminal pro-a1(V) BMP-1 cleavage site (Fig. 4) and of furin proprotein 
convertase recognition sites in COOH-telopeptide domains in the pro-a1(XI) 
and pro-a2(XI) chains, suggests that these related chains may be processed in 
a fashion similar to that of the pro-a1(V) chain. 

A subsequent study with recombinant pro-a1(V)3; homotrimers showed that 
blocking cleavage of the C-propeptide in cell cultures, by use of the highly spe- 
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cific furin inhibitor decanoyl-RVKR-chloromethyl ketone, results in full-length 
pro-«1(V), homotrimers that can be cleaved in vitro by BMP-1 at both the NH,- 
terminal globular site identified in the previous study, and within the COOH- 
telopeptide domain at a site with a P1” Asp [170]. This second pro-o1(V), 
homotrimer study, which used higher levels of BMP-1 activity than the first, nev- 
ertheless found cleavage at the COOH-telopeptide site to be markedly less effi- 
cient than cleavage at the site in NH;-terminal globular sequences. In a third 
study, recombinant pro-a1(V),pro-a2(V) heterotrimers were successfully pro- 
duced and employed in demonstrating that pro-a1(V) NH,-terminal globular 
sequences are cleaved by BMP-1 at the same site as in pro-al(V)3 homotrimers, 
and that pro-a1(V) C-propeptides are still cleaved by furin in the context of 
pro-a1(V);pro-o2(V) heterotrimers [171]. In contrast, the C-propeptides of 
pro-o2(V) chains within the same pro-a1(V),pro-a2(V) heterotrimers were not 
cleaved by furin, but were instead cleaved by BMP-1 at a site similar to those at 
which the C-propeptides of the major fibrillar procollagens are cleaved, in that 
it contained a P1” Asp and a P3 Phe (Figs. 4 and 5). Thus, furin cleavage of pro- 
collagen C-propeptide sequences seems confined to procollagen chains of the 
pro-a1(V)-like subclass, a likelihood supported by the lack of furin cleavage 
consensus sequences in all of the major fibrillar procollagen COOH-telopeptide 
regions, and the demonstration that neither procollagen I or II are susceptible 
to cleavage by furin in vitro [171]. 

In vivo significance of the patterns observed for in vitro processing of re- 
combinant type V procollagens by furin- and BMP-1-like proteinases was as- 
certained by comparing processing of endogenous pro-a1(V) chains in cultures 
of wild type and Bmp1:Tİİl1 doubly homozygous null MEFs [171]. In such ex- 
periments, media of wild type cultures were found to contain pN-a1(V) chains 
(from which C-propeptides had been removed, but which still retained full, 
uncleaved NH?-terminal globular sequences) and mature a1(V) chains (cor- 
responding to the matrix form in which both C-propeptide and PARP subdo- 
main of the NH,-terminal globular domain have been removed) (Fig. 5), while 
media of Bmp1;Tll1 doubly null MEFs contained only the pN-a1(V) form 
[171]. Such results are wholly consistent with the probability that products of 
the Bmpl and TII1 genes are responsible for cleaving pro-a1(V) N-propeptides, 
but not C-propeptides in vivo. In addition, culturing of MEFs in the presence 
of the specific furin inhibitor decanoyl-RVKR-chloromethyl ketone resulted in 
detection of only intact full-length pro-a1(V) chains and pC-a1(V) forms 
(which retain the C-propeptide, but from which the PARP subdomain of the 
NH,-terminal has been removed) (Fig. 5) in wild type media, and only intact 
full-length pro-a1(V) chains in the media of Bmp1;Tll1 doubly null MEFs 
[171]. These latter results are consistent with the probability that, whereas prod- 
ucts of the Bmp1 and TII1 genes are responsible for cleaving pro-a1(V) NH,- 
terminal sequences in vivo, furin-like activity is solely responsible for cleaving 
pro-a1(V) C-propeptides in vivo. 

The relative lack of similarity between residues flanking the BMP-1 cleavage 
site in pro-a1(V) NH;-terminal globular sequences and previous sites cleaved 
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by BMP-1-like proteinases indicates that BMP-1-like proteinases, like other 
astacin-like proteinases [46], are not highly specific for such residues and sug- 
gests a reappraisal of features that influence cleavage of substrates by BMP-1-like 
proteinases. In fact, it should be stressed that even the limited conservation 
found in residues flanking cleavage sites in previously characterized substrates 
of BMP-1-like proteinases (Fig. 4) is somewhat misleading, since 1) conserved 
residues flanking sites for cleavage of C-propeptides of procollagen I-III, and pro- 
o2(V) chains sites may reflect the similar evolutionary origin of these chains 
rather than a requirement for such residues for cleavage, and 2) a number of 
other substrates have been selected as candidate substrates based on the simi- 
larities of their in vivo cleavage sites to those of procollagens I-III. Nevertheless, 
it should be noted that in at least some cleavage sites a DU Asp is essential for 
cleavage, since substitution of the P1’ Asp by Ala at either the pro-o2(I) C-propep- 
tide cleavage site [172] or the site within myostatin prodomain sequences [129] 
makes these sites impervious to cleavage by BMP-1-like proteinases. 

Cleavage products of known substrates of BMP-1-like proteinases, such as 
procollagen I [62], Chordin [62], and probiglycan [64], are absent from media 
of Bmp1;TIl1 doubly null MEFs. This suggested that cleavage products of other 
substrates would be absent as well. If so, such fragments, represented as spots 
on 2D gels of wild type MEF media, would be absent from 2D gels of Bmp1;TIl1 
null MEF media. Thus, identification of such spots via a proteomics-based ap- 
proach could lead to identification of novel substrates in a fashion that, unlike 
the previous candidate approach, would be unbiased by preconceived notions 
of features that a cleavage site for BMP-1-like proteinases must have [62]. In such 
a proteomics approach, four spots on a 2D gel of wild type MEF media, but miss- 
ing from a 2D gel of Bmpi/Tİl1 null MEF media, were subjected to analysis by 
mass spectrometry [62]. Three of the spots represented the C-propeptides of the 
pro-a1 and pro-a2 chains of procollagen I and of the pro-a1 chain of procolla- 
gen III, thus validating the approach as a means for identifying in vivo substrates 
of Bmpl and TII1 gene products. The fourth spot represented the PARP sub- 
domain of pro-a1(XI) NH;-terminal globular sequences [62]. Moreover, sub- 
sequent immunoblot analyses of materials from MEF media supported the 
conclusion that the pro-a1(XI) NH,-terminal globular domain is indeed 
processed by BMP-1-like proteinases and that Furin-like proteinases are solely 
responsible for cleaving pro-a1(XI) C-propeptides in vivo [62]. 

An independent study, using recombinant proteins for in vitro assays, also 
demonstrated that BMP-1 cleaves within pro-a1(XI) NH,-terminal globular 
sequences [173]. The latter study also showed that pro-a1(XI) isoforms, dif- 
fering in NH,-terminal globular variable subdomain (Fig. 5) sequences due to 
alternative splicing [174, 175], differ in the efficiency with which they are pro- 
cessed by BMP-1, although each isoform is cleaved at the same site [173]. It was 
speculated that such differences in processing rates may result in fine regulation 
of interactions between heterotypic type II/XI collagen fibers and other ECM 
components, as such interactions may be mediated by o 1 OD NH;-termini [173]. 
Peptides extracted from fetal calf cartilage suggest additional proteolytic 
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processing of pro-a1(XI) NH,-terminal sequences by proteinases other than 
BMP-1-like proteinases, at various sites [173], although it is not clear whether 
these additional cleavages are physiologically relevant or represent artifactual 
cleavages that occurred during tissue extraction. It should be noted that alter- 
native splicing of variable subdomain sequences has been observed for pro- 
«1(XI) and pro-a2(XI) chains 1162, 174, 175], but does not appear to occur in 
pro-a1(V) chains [158]. 

Conservation of sequences for cleavage by furin-like proproteins within the 
pro-a3(V) COOH-telopeptide [146], suggests that cleavage of this domain will 
be by furin-like enzymes. However, although residues at the P1”, P3’, and P2 
positions of the pro-a1(V) NH,-terminal cleavage site are conserved at corre- 
sponding positions in pro-a1(XI) and pro-a2(XI) NH;-terminal globular se- 
quences [169], such conservation is lacking in pro-«3(V) [146]. Thus, it remains 
to be determined whether the pro-a3(V) NH,-terminal globular domain is 
processed and, if so, at which location and by which proteinase(s). Another un- 
resolved aspect of the proteolytical processing of the minor fibrillar procolla- 
gens concerns processing of the pro-a2(V) N-propeptide. It has been suggested 
that the pro-a2(V) N-propeptide may be cleaved by the same pNP activity 
that cleaves N-propeptides of major fibrillar procollagens, primarily because of 
similarities in the domain structure of the pro-a2(V) and procollagen I-III 
N-propeptides, but also because pro-a2(V) NH,-terminal globular sequences 
contain the sequence Phe-Ser-Ala-Gln at a position similar to the Phe/Tyr-Ser/ 
Ala/Pro\ Gln sites at which pNP activity cleaves the N-propeptides of procolla- 
gens I-III [168]. However, examination of the mature matrix form of a2(V) 
chains from various tissues has suggested that NH,-globular sequences are re- 
tained in their entirety [140]. Thus, it will be of interest to determine whether the 
pro-a2(V) N-propeptide is susceptible or resistant to cleavage by ADAMTS-2. 

Cleavage of the C-propeptides of pro-a1(V)-like chains by furin-like pro- 
teinases may add another level of regulation to collagen fibrillogenesis and link 
fibrillogenesis to the many other processes governed by these enzymes, which 
represent the major processing enzymes of the constitutive secretory pathway 
[176, 177]. In its native form, furin has a transmembrane domain and is pri- 
marily localized to the trans-Golgi, but cycles between the tran-Golgi and 
plasma membrane 1176, 177]. Thus, cleavage of pro-a1(V) C-propeptides may 
occur in the trans-Golgi or at the cell surface. This would agree with early ob- 
servations that cleavage of pro-a1(V) C-propeptides is relatively rapid [137]. 
However, furin may also be released from the plasma membrane as a secreted 
form [178], and thus extracellular cleavage of pro-al(V) C-propeptides by 
furin-like activities cannot be excluded. 

Cleavage of the C-propeptides of pro-a2(V) chains, which by various crite- 
ria are more similar to procollagen I-III chains than to pro-o1(V) chains, by 
BMP-1-like proteinases, and the likely cleavage by BMP-1-like proteinases of the 
C-propeptide of the pro-a3(XI) chain, a modified product of the type II colla- 
gen pro-a1(II) chain gene [150], may serve to coordinate deposition of minor 
and major fibrillar collagen monomers within heterotypic fibrils. 
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Quite recently, sequences for two novel fibrillar-like procollagen chains, 
proal(XXIV) and pro-a1(XXVIT), have been mined from the genome databases 
1179, 180]. Each of these chains shows certain similarities to the pro-a1(V), pro- 
«1(XI), pro-o2(XI) and pro-a3(V) chains, and it will be of interest to determine 
at which sites and by which proteinases they are processed. 


4 
Processing of Type VII Collagen 


Type VII collagen, a nonfibrillar collagen composed of three identical «1(VII) 
chains, is the major and perhaps sole component of anchoring fibrils, which 
are involved in attachment of external epithelia, such as epidermis, to under- 
lying stroma [181] (Fig. 6). Type VII collagen is produced as precursor pro- 
collagen molecules [182]. Upon secretion, these associate into antiparallel 
dimers, C-propeptides are cleaved, and anchoring fibrils are believed to form 
via lateral association of type VII collagen dimers into macromolecular struc- 
tures [182-184]. The functional importance of anchoring fibrils is evident in 
the severe blistering phenotype of dystrophic epidermolysis bullosa (DEB) 
that results from defective or absent anchoring fibrils, resulting from mutations 
in the type VII collagen gene [185-187]. The procollagen C-propeptide, also 
known as the NC-2 domain, appears necessary for initial formation of type VII 
collagen antiparallel dimers and also contains Cys residues necessary to for- 
mation of disulfide bonds that stabilize the dimer [188, 189]. 

Proteolytic removal of procollagen VII C-propeptides seems necessary for 
proper formation of anchoring fibrils, as an in-frame exon-skipping mutation 
that removes 29 amino acid residues containing the in vivo cleavage site results 
in DEB [190]. Since BMP-1-like proteinases appear to be involved in laminin 5 
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Fig. 6 Processing of type VII procollagen C-propeptides, and lateral assembly of antiparallel 
dimers to form anchoring fibrils. Vertical lines connecting monomers represent disulfide 
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processing [120, 122], which like collagen VII is located at the dermal-epidermal 
junction, and since the 29 residues containing the procollagen VII C-propeptide 
cleavage site contain sequences similar to those at a majority of sites cleaved 
by BMP-1-like proteinases (Fig. 4), procollagen VII became a candidate sub- 
strate for BMP-1-like proteinases. In vitro assays have shown that BMP-1 and 
mTLL-1 are capable of cleaving a truncated recombinant form of procollagen 
VII at the predicted site (Fig. 4), with lesser levels of this activity noted for 
mTLL-2 (mTLD was not tested) [191]. Consistent with an in vivo role for such 
activity, BMP-1, which is capable of cleaving authentic procollagen VII from 
normal keratinocytes, is incapable of cleaving mutant procollagen VII from 
which 29 residues containing the cleavage site have been deleted, and which 
remains uncleaved in tissues of DEB patients. However, collagen VII in the skin 
of Bmp1-null 17.5 dpc fetuses appears to be processed and deposited in skin to 
the same extent as in wild type littermates. It remains unclear at this time 
whether this observation argues against a role for in vivo processing of pro- 
collagen VII by products of the Bmp1 gene, or whether functional redundancy 
by mTLL-1 provides sufficient residual activity to explain the processing of 
procollagen VII in Bmpl-null skin. The latter possibility cannot be directly 
tested at this time, as TII1 null and Bmp1;TII1 doubly null embryos die on or 
prior to 14 dpc [62, 192], at which time the skin basement membrane zone is 
undeveloped and unamenable to immunohistological or biochemical analysis. 
It is also possible that mTLL-2 or an unrelated proteinase(s) contributes to in 
vivo processing of procollagen VII. Creation of conditional knockout mice, in 
which effects of null alleles for the Bmp1, TII1, and DD genes are limited to 
individual tissues, should enable survival of mice to developmental stages that 
will allow testing of which BMP-1-like proteinases or combination of pro- 
teinases may be involved in cleaving procollagen VII in skin. 

Type VII procollagen is produced by keratinocytes, whereas it has been sug- 
gested that the majority of type VII procollagen-cleaving activity is provided 
by fibroblasts at the dermal-epidermal junction [191]. The latter is in contrast 
to indications that laminin-5-processing activity is provided by keratinocytes 
[120]. Clearly, the full range of roles of cells and proteinases involved in form- 
ing the mature laminin-5 and type VII collagen of the dermal-epidermal base- 
ment membrane zone remains to be fully elucidated. 


5 
Processing/Shedding of Cell Surface Collagen Types XIII, XVII and XXV 


Certain collagens are not exclusively ECM components, but rather spend some 
portion of their extracellular existences as integral proteins of the plasma 
membrane. Three such collagens are types XIII, XVII and XXV. Although these 
three collagen types do not share high degrees of sequence homology, each is a 
type II transmembrane protein with an NH,-terminal cytoplasmic domain, a 
hydrophobic transmembrane domain located in the NH,-terminal portion of 
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Fig.7 Processing of cell surface collagens 


the molecule, and a relatively large extracellular domain [193-195], that is 
proteolytically cleaved within sequences closely juxtaposed to the plasma 
membrane (Fig. 7). 

A small proportion of type XIII collagen, which has been suggested to play 
roles in binding soluble ligands and/or ECM components to cells [193], is 
cleaved from the surfaces of cultured cells and is found in culture media [196]. 
Recombinant forms of XIII collagen lacking the small NH,-terminal cytoplas- 
mic domain are much more susceptible to cleavage than are full-length forms, 
suggesting some level of control by this domain over processing of extracellu- 
lar sequences [197]. NH,-terminal sequencing of the shed ectodomain shows 
that cleavage occurs immediately COOH-terminal to a consensus recognition 
site for cleavage by furin-like proprotein convertases, and furin-specific in- 
hibitor decanoyl-RVKR-chloromethyl ketone inhibits ectodomain shedding 
[197]. Thus, some portion of type XIII collagen molecules, found on the surface 
of a variety of cell types, seems to be shed via proteolytic cleavage by furin-like 
proteinase(s), although the biological significance of this process is at present 
unknown. 

Collagen XXV is expressed by neurons of the brain, and its shed ectodomain 
binds to fibrils of amyloid f peptide, with which it is co-deposited in senile 
plaques associated with Alzheimer's disease [195]. Collagen XXV ectodomain 
is apparently shed via cleavage by furin-like proteinases, since amino acid sub- 
stitutions within a consensus sequence for cleavage by furin-like proteinases 
inhibits cleavage [195]. 

Type XVII collagen, also known as the 180-kDa bullous pemphigoid antigen 
(BP180), is an integral component of epithelial cell hemidesmosomes, involved 
in securing cells to underlying basement membranes [198]. The importance 
of type XVII collagen is underscored by phenotypes of the disease junctional 
epidermolysis bullosa and autoimmune diseases of the pemphigoid group, in 
which mutations in the type XVII collagen gene and autoantibodies against 
type XVII collagen, respectively, lead to decreased epidermal adhesion and 
severe blistering [199,200]. The ectodomain of collagen XVII is proteolytically 
shed, and such shedding is inhibited by incubating keratinocytes in the pres- 
ence of decanoyl- RVKR-chloromethyl ketone [201]. However, furin does not 
cleave purified type XVII collagen in vitro, whereas shedding is potentiated by 
culturing cells in the presence of phorbol esters and is inhibited by a specific 
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spectrum of metalloproteinase inhibitors, both of which effects fit the profile of 
either MMPs or ADAM family metalloproteinases [202]. Cells from knockout 
mice null for candidate MMPs, such as MT1-MMP and MMP-2, show no di- 
minishment in collagen XVII shedding [202]. Furthermore, cleavage of collagen 
XVII by MMPs produces fragments that differ in size from those produced via 
physiological processing, and a specific gelatinase inhibitor that strongly inhibits 
the activities of candidate MMPs MMP-2 and MMP-9 does not inhibit collagen 
XVII shedding [202]. In contrast, the ADAM family member TACE cleaves 
type XVII collagen to produce fragments identical in size to those produced by 
cells, and keratinocytes from TACE knockout mice show a marked decrease in 
collagen XVII shedding [202]. Thus ADAM family members TACE, ADAM-9 
and ADAM-10, all of which are expressed in keratinocytes [202], are candidates 
for involvement in in vivo processing of collagen XVII. Presumably, the ability 
of decanoyl-RVKR-chloromethyl ketone to inhibit collagen XVII shedding is 
indirect, since furin-like activity is necessary for activation of ADAM family 
members [203]. Further information concerning attributes of the ADAM mem- 
brane-anchored metalloprotease sheddases, which are involved in myriad pro- 
cesses in development and homeostasis, can be found in the review by Becherer 
and Blobel [203]. Although the physiological roles of collagen shedding are un- 
clear, the shed ectodomain of type XVII collagen is bound by keratinocytes, 
and processing of collagen XVII may contribute to altered motility of epithelial 
cells [202]. 

A number of additional type II integral membrane proteins exist that contain 
collagenous regions in their ectodomains [204]. One of these, collagen XXIII, is 
known only via data mining of the genome databases, and whether this collagen 
is proteolytically processed remains to be determined. Remaining transmem- 
brane proteins with collagenous domains are not formally collagens [204]. Only 
one of these proteins, ectodysplasin A (EDA) is known, at this time, to be proteo- 
lytically processed. Ectodomains of alternatively spliced versions of EDA play 
roles in epidermal development [205] and mutations in the gene that encodes 
EDA result in the genetic condition X-linked anhidrotic/hypohidrotic ectoder- 
mal dysplasia, characterized by impaired development of hair, sweat glands and 
teeth [206]. Proteolytic shedding of EDA appears to be via furin-like convertase 
activity, and mutations that cause single amino acid substitutions within the 
furin site consensus sequence block shedding of the ectodomain and cause 
hypohidrotic ectodermal dysplasia [207]. This latter effect shows processing of 
the ectodomain to be important to the developmental roles of EDA [207]. 


6 
Processing of Multiplexin Collagen Types XV and XVIII 


The two nonfibrillar collagen types XV and XVIII have similar domain struc- 
tures and together have been referred to as the multiplexin (multiple triple- 
helix domains and interruptions) collagen subfamily [208]. Collagens XV and 
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XVIII are heparan sulfate [209] and chondroitin sulfate [210] proteoglycans, 
respectively, and both are basement membrane components [211-213]. Func- 
tions of intact collagens XV and XVIII are not well understood, although mice 
null for the collagen XV gene have mild, progressive degeneration of skeletal 
muscle, susceptibility to exercise-induced muscle damage, and abnormal micro- 
vasculature of the heart and skeletal muscle [214]; and mice null for the colla- 
gen XVIII gene show subtle defects in the iris and microvasculature of the eye 
1215, 2161. Furthermore, mutations in the human gene for collagen XVIII can 
result in Knobloch syndrome, which is characterized by various eye abnor- 
malities [217]. Of particular interest is the finding that a cleaved subdomain, 
designated endostatin, of the COOH-terminal noncollagenous NC1 domain of 
collagen XVIII, can have potent antiangiogenic effects in certain in vivo settings 
[218]. Although such effects have been demonstrated via administration of 
exogenous recombinant endostatin [218], relatively high levels of endogenous 
endostatin are detected in mouse and human sera [219], suggesting the possi- 
bility of physiological roles. However, endogenous endostatin of normal serum 
exhibits heterogeneous NH),-termini, which differ from the single NH,-termi- 
nus of the endostatin originally employed in angiogenesis inhibition assays and 
derived from conditioned media of the EOMA line of murine hemangioen- 
dothelioma cells [218]. In fact, endostatin, a compact autonomously folding 
proteinase-resistant domain, is linked to the remainder of type XVIII via a 
proteinase-sensitive hinge region, within which cleavage appears to occur via 
multiple proteolytic pathways [219]. Several data suggest that cleavage may be 
a two step process with initial cleavage of the entire NC1 domain trimer by met- 
alloproteinase(s), and subsequent cleavage within the proteinases-sensitive 
hinge region to release endostatin [220, 221]. Cleaved, intact NC1 domain may 
be retained in basement membranes due to strong non-covalent interactions 
with matrix components, whereas subsequent cleavage may result in the 
diffusion of the isolated endostatin domain, which does not bind as strongly 
to matrix components [219]. Serine elastases are capable of cleaving at the 
same site at which endostatin is cleaved in EOMA cell cultures, and adminis- 
tering elastase inhibitors, such as elastatinal, to EOMA cells in one study 
inhibited the production of endostatin, with a concomitant accumulation of 
intact NC1 domains [220]. Curiously, another study showed that treatment of 
EOMA cells with elastatinal did not inhibit production of endostatin [221]. 
This same study showed cathepsin L to cleave endostatin at the same site em- 
ployed in EOMA cell cultures, showed EOMA cells to secrete cathepsin L, and 
showed that an inhibitor specific to cathepsins inhibits proteolytic production 
of endostatin in EOMA cell cultures [221]. Thus cathepsin L, which has an 
acidic pH optimum, has been suggested to cleave endostatin in the acidic peri- 
cellular microenvironment of tumors [221]. However, different tissues and 
sera contain species of endostatin that vary in their NH;-termini, suggesting 
that various proteinases are involved in processing endostatin in vivo [219]. 
In a survey, 11 of 12 proteinases tested cleaved within the same 15 residue 
span of proteinase-sensitive hinge region as that in which in vivo cleavages oc- 
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cur, and some of the same proteinases were capable of degrading endostatin 
upon longer incubations [222]. Thus, it remains to be determined which pro- 
teinase(s) may be involved in cleaving collagen XVIII in vivo to produce phys- 
iologically functional forms of endostatin, and which are involved in catabolic 
degradation of collagen XVIII or artifactual processing associated with isola- 
tion of endostatin from tissues and culture media. Fragments corresponding 
to the endostatin domain of collagen XV have been isolated from human 
blood filtrate [223] and detected in mouse tissues [224], suggesting possible 
physiological significance. However, although the collagen XV endostatin-like 
fragment has been shown to have anti-angiogenic effects in assays [224, 225], 
the nature of the proteolytic processing of this fragment has not been explored 
at this time. 


7 
Concluding Remarks 


Processing of the major and minor fibrillar procollagens by mammalian 
BMP-1-like proteinases likely coordinates fibrillogenesis with other biosynthetic 
events in vivo, since the same proteinases are responsible for the processing of 
other structural molecules and for regulating signaling by certain TGF-B-related 
growth factors and morphogens. Uncovering the full range of processes to 
which fibrillogenesis is linked will require identification of additional in vivo 
substrates of the BMP-1-like proteinases. Such identification will require, in 
addition to the candidate substrate approach, high throughput methodologies, 
such as mass-spectrometric analysis of cleavage products produced by wild 
type cells, but not by the cells of mice null for various combinations of BMP-1- 
like proteinase genes. High throughput screens may also be of use in compar- 
ison of cleavage products in untreated normal cell cultures and cultures of the 
same cells treated with small molecule inhibitors reportedly specific for the 
mammalian BMP-1-like proteinases [120, 226]. 

Clearly, the same types of high throughput methodologies can be conducted 
for the pNPs, using knockout mice null for various combinations of the genes 
for ADAMTS-2, -3 and -14, such that functional redundancies will have been 
removed, or using highly specific inhibitors, should they become available. 
Additional high throughput screens, such as phage display with cDNA expres- 
sion libraries representing various tissues, should also identify novel binding 
partners of the BMP-1-like and ADAMTS proteinases. Such binding partners 
could include novel substrates, and possible endogenous inhibitors and en- 
hancers of proteinase activity. Another future goal is to define physiological 
roles for cleavage/shedding of cell surface collagens, perhaps via use of knockin 
mutations that destroy relevant cleavage sites. Clearly, further understanding 
of the processing of collagens is important, as such processing represents 
important control points in the regulation of collagen functions and in the 
integration of collagen biosynthesis with other in vivo events. 
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Abstract Collagens are a family of proteins with considerable molecular diversity. The genetic 
diversity is increased by assembly of multiple isoforms, alternative splicing and post-trans- 
lational modifications. All collagen molecules share common features, including having at 
least one domain composed of three polypeptide chains organized as a triple helix. However, 
these molecules assemble to form a variety of supramolecular structures, e.g., fibrils, 
filaments, or networks, responsible for the characteristics of specific extracellular matrices. 
These supramolecular structures are alloys or macromolecular composites, containing 
different collagen types as well as other matrix macromolecules. This heteropolymeric 
composition provides an additional level of complexity. Current concepts of collagen supra- 
structures, their assembly and function within extracellular matrices will be the focus of this 
review. 


Keywords Collagen supramolecular structures - Extracellular matrix - Fibrillar collagens - 
Network-forming collagens - FACIT collagens 
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1 
Introduction 


At least 27 known types of collagens constitute a family of glycoproteins that 
occur in extracellular matrices of vertebrates and invertebrates with different 
suprastructural organizations (Table 1). All collagens are trimers, each com- 
posed of one, two or three distinct gene products. The molecular diversity of 
collagen polypeptides is enhanced further by alternative promoters and 
mRNA-splicing as well as by the possibility to combine collagenous polypep- 
tides in alternative manners that, for some collagen types, results in the for- 
mation of isoforms. Finally, collagen molecules are subject to a variable extent 
of post-translational modification and/or proteolytic processing, modulating 
even further their molecular architectures and surface properties (for review, 
see chapter Ricard-Blum). 

Collagen molecules have in common at least one domain comprising three 
polypeptide chains folded into collagen-like triple helices. This hallmark feature 
renders collagen molecules well adapted to assemble into highly multimeric 
suprastructural aggregates, thereby converting protomeric, mainly unfunctional 
molecules into their operative state. In this respect, it is particularly intriguing 
that one and the same collagen type can predominate, sometimes overwhelm- 
ingly, in tissue aggregates that otherwise are strikingly diverse. This is well 
exemplified by collagen I, the major protein of banded fibrils with vastly dis- 
similar tissue-specific organizations. The most likely explanation for this para- 
dox is the fact that most, if not all, collagen-containing suprastructures have 
complex macromolecular compositions that not only include other collagen 
types, but also non-collagenous components. These additional macromolecules 
may be substantial or occur in minute quantities. Invariably, however, they 
decisively influence tissue-specific architectures and functions. This review will 
outline our current understanding of the compositional requirements of 
supramolecular assemblies containing collagens to yield fibrils, microfibrils, 
and/or networks. We shall dwell less on the molecular collagen structures, a 
subject covered elsewhere in this volume as well as in recent excellent reviews 


Table 1 Collagen suprastructures 


Suprastructure Collagen types 

Fibril I, IL III, V, XI, XXIV, XXVII 
Fibril-associated (FACIT?) IX, XII, XIV, XVI, XIX, XX, XXI, XXII 
Network IV, VL VIIL X 

Anchoring fibrils VU 

Transmembrane collagens XIII, XVII, XXIII, XXV 

Multiplexin XV, XVIII 


* Fibril-associated collagens with interrupted triple helices. 
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[1-3]. Tissue suprastructures, like metal alloys, are uniform materials com- 
prising several molecular species and with properties distinct from those of 
unimolecular aggregates. This is well exemplified by cartilage fibrils (see be- 
low). However, collagen-containing suprastructures also can be composites of 
more than one alloy or unimolecular assemblies. As also detailed below, base- 
ment membranes represent good examples for this mode of aggregation. In the 
following, we shall discuss current concepts of the supramolecular structure, 
assembly and function within extracellular matrices. 


2 
Fibrillar Collagens 


2.1 
Polymerization of Fibrillar Collagens 


The fibril-forming collagens, i.e., I, III, V, XI, XXIV and XXVII are rod-like col- 
lagens with a large triple helical domain (ca. 300 nm) containing 990 to 1020 
amino acids per polypeptide chain. It is now clear that these collagens co-as- 
semble into banded fibrils in tissues including bone, dentin, tendons, cartilage, 
dermis, sclera, cornea, and the interstitial connective tissues in and around many 
organs. They are arranged in longitudinally staggered arrays of molecules of a 
length that is a non-integer multiple of the stagger between next neighbors. 
Thus, a gap occurs sequentially between neighboring molecules giving rise to 
a gap-overlap structure in all collagen fibrils with a D-periodic banding as 
schematically represented in Fig. 1. In addition, the longitudinal axis of the fib- 
rillar collagen molecules is not parallel to that of the fibrils. It still is unclear 
whether the molecules are supercoiled around each other or whether they are 
crimped within the fibrils [4]. It is possible that, depending on the type and the 
origin of fibrils, both forms of organization exist. The relative abundance of the 
collagen types in fibrils is tissue- or tissue domain-specific and, in some cases, 
may be very small. For example, collagen III has been reliably identified in 
certain bone fibrils by immunoelectron microscopy [5], but was not detectable 
among the collagen proteins extracted from that tissue. Either, collagen III is 
too scarce to allow visualization by protein chemical techniques or, more likely, 
is insoluble due to the highly abundant covalent cross-linking. Nevertheless, 
collagen III is likely to affect fibrillar assembly and structure because its triple 
helical domain is longer than that of type I, the major bone collagen. The triple 
helical domain of collagen III is composed of 340 amino acid triplets Gly-Xaa- 
Yaa per chain whereas that of collagen I only has 338. In addition to type IIL a 
bone-specific variant of collagen V/XI has been identified in bone [6]. Likewise, 
the molecular composition of skin fibrils is complex even in terms of fibrillar 
collagens. Again, the major collagen is type I, that is co-polymerized with 
substantial amounts of collagen III in a developmentally regulated and/or tis- 
sue domain-dependent manner. In addition, lesser amounts of other fibrillar 


188 D. E. Birk - P. Bruckner 


—90000000000000000000000000000000900C |15nm 


300 nm 


Fig.1 Structure of a generic collagen fibril. A D-periodic collagen fibril from tendon is 
presented at the top of the panel. The negative stained fibril has a characteristic alternating 
light/dark pattern representing the gap (dark) and overlap (light) regions of the fibril. The 
diagram represents the staggered pattern of collagen molecules giving rise to this D-periodic 
repeat. The collagen molecules (arrows) are staggered N to C. The fibrillar collagen molecule 
is approximately 300 nm (4.4 D) in length and 1.5 nm in diameter 


collagens, mainly type V, also are present in skin fibrils. Again, it is likely that 
distinct fibrils arise from these mixtures and that the distinctions depend on 
the molar fractions of the composite fibrillar collagens. 

One supramolecular characteristic of banded collagen fibrils is the length 
of the stagger between adjacent fibrillar collagen molecules (D-period) that is 
tissue-specific (e.g., 67 nm in rat tail tendon and 64 nm in human dermis) [7]. 
Another variable between tissues is the tilt of the long axis of collagen mole- 
cules with respect to the longitudinal axis of the fibril. Variations of this kind 
and magnitude necessitate substantial differences in molecular organization, 
including longitudinal D-periodic packing as well as intermolecular distances. 
It has recently been shown, that very small quantities of collagen XI (less than 
1 part in 1000) can profoundly alter the fibrillar organization of collagen I. In 
addition, fibril formation by pure collagen I in vitro is remarkably slow and in- 
compatible with the requirements of fibrillogenesis in situ because nucleation 
in the absence of collagen V/XI represents a formidable kinetic barrier against 
the assembly process. Thus, considerable lag periods of aggregation ensue. 
However, the quantitatively minor collagen XI, together with collagen I, can 
form a uniform core of the fibril that efficiently nucleates further lateral growth 
by accretion of collagen I. Thus, collagen I/XI-fibrils are composites containing 
alloyed cores and sheaths of pure collagen I. Their final morphology is variable 
and depends on the collagenous composition [8]. Extrapolating this concept to 
banded fibrils in general, it has now become clear that the mixtures of fibrillar 
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collagens, i.e., types I, II, III, V/XI, XXIV and XXVII, act as key regulators of the 
collagen organization in fibrils, resulting in different modes of supercoiling or 
crimping and, hence, the length of the D-periodic banding. 


2.2 
Influence of Post-Translational Processing of Collagens on Fibril Structure 


The covalent modifications occurring after polypeptide synthesis are particu- 
larly prominent in collagens and are likely to have a profound impact on the 
assembly of fibrillar collagens [9-11], as well as FACITs. The cylindrical cir- 
cumference of triple helical domains of collagens is affected by the varying 
extent of hydroxylation of lysyl residues and subsequent galactosylation and 
glucosyl-galactosylation of hydroxylysyl residues. Thus, intermolecular center- 
to-center distances correlate with the extent of glycosylation, especially if the 
post-translational modifications affect polypeptide parts eventually situated 
in overlap regions of the collagen packing. The extent of glycosylation of hydro- 
xylysine can be manipulated by temperature [9], activity levels of modifying 
enzymes in the relevant subcellular compartments [12], or, most notably, by 
disease-causing mutations [2]. Such mutations can substantially reduce the 
rates of triple helix-formation in fibrillar procollagens in the rough endoplas- 
mic reticulum and cause overmodification because the hydroxylating and gly- 
cosylating enzymes reside in that compartment and only recognize unfolded 
collagen-like polypeptides as their substrate. Collagen molecules that are mu- 
tated and overmodified in this manner co-polymerize with normal molecules, 
thereby compromising normal fibrillar organization. This molecular relation- 
ship of genotypes and phenotypes has thus been designated as "protein sui- 
cide". However, differences in the extent of glycosylation also can be a mode of 
physiological regulation of fibrillar organization. For example, collagen I is 
known to be glycosylated in regions residing in the overlap domains of banded 
fibrils if the protein is a product of corneal, but not tendon fibroblasts. 

An attractive possibility of regulation of fibril morphology is that the avail- 
ability of protomers by enzymatic cleavage may control fibril growth and shape 
[13]. However, it is not easy to see how a kinetic control can determine long-term 
stability of fibrillar assemblies. The heterotypic fibril assembly model also 
involving non-collagenous components, such as the small leucine-rich proteo- 
glycan decorin, has the advantage of introducing the possibility of thermody- 
namic shape control [14]. Nevertheless, the two concepts may act in concert. 


3 
Fibril-Associated Collagens with Interrupted Triple Helices (FACIT) 


The supramolecular complexity of D-banded fibrils is augmented further by 
the incorporation of fibril-associated collagens with interrupted triple helices 
(FACIT). This collagen subfamily comprises the types IX, XII, XIV, XVI, XIX, 
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XX, XXL XXII, and XXVI, and its members are structurally very diverse. They 
only have in common a FACIT-domain, i.e., a relatively short, carboxy-terminal 
triple helical stretch flanked by a cysteine-containing motif, GXCXXXC, at the 
junction of the triple helix and the carboxy-terminal non-helical region. The 
cysteines are essential for the covalent bonding of the three constituent 
polypeptides. It has been proposed that the FACIT-domain of collagen IX may 
be incorporated into the gap between consecutive fibrillar collagen molecules 
in cartilage fibrils [15]. Although this has not formally been proven it is an at- 
tractive possibility that other FACITS are integrated into corresponding fibrils 
in other tissues and, therefore, provide an opportunity for tissue-specific mod- 
ulation of the fibril surfaces by projection of the non-FACIT-domains into the 
perifibrillar space. The selective expression of the FACITs would thus provide 
an elegant molecular mechanism for molding of fibril surfaces. Indeed, it has 
been shown that the amino-terminal regions of collagens IX, or XII and XIV 
protrude from the fibril surfaces in cartilage or skin and tendon, respectively 
[5, 16]. The biomechanical diversity of banded fibrils may thus be a direct 
consequence of distinctions of surface properties afforded by FACITs. 

At least in the case of the prototypic FACIT, collagen IX, triple helical domains 
other than the carboxyterminal FACIT-domain also are incorporated into the 
fibril body, possibly by an antiparallel alignment with the fibrillar collagens of 
cartilage (see below). In doing so, FACITs become part of the fibril bodies, very 
much like the fibrillar collagens, thereby modulating further and/or stabilizing 
the molecular organization within fibrils. However, the suprastructural associ- 
ation with fibrillar collagens is not a generalized feature of all FACITs. For 
example, collagen XVI can be an optional constituent of banded fibrils in carti- 
lage (see below). In skin, however, the protein never is incorporated into banded 
fibrils, but rather is a component of fibrillin-containing microfibrils. Recently, we 
showed that in tissue junctions collagen XXII also coexists with fibrillin-con- 
taining suprastructures rather than banded fibrils [17]. Thus, FACITs are not 
always are associated with banded fibrils and, where they are, they can be inte- 
gral parts and important organizers of the overall fibril structure rather than 
optional additions to preexisting aggregates. 


4 
Network-Forming Collagens 


4.1 
Collagen IV Networks 


Basement membranes are composites consisting of several independent, but 
intertwined supramolecular networks. One of these networks contains collagen 
IV that comes in several isoforms, whereas the others harbor as their major 
molecular components laminin, also existing in several isoforms, or perlecan, 
respectively. Further macromolecules, including nidogen/entactin, have been 
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shown to mediate molecular contacts, thereby stabilizing the compound macro- 
molecular networks in basement membranes. Distinct basement membranes 
occur in different anatomical locations or may be sequentially formed during 
development. This also is reflected by the subtypes of collagen IV that are 
differentially expressed under distinct circumstances. There are six collagen 
IV-encoding genes, i.e. COL4A1 through COLAA6, giving rise to the corre- 
sponding polypeptides alphal(IV) through alpha6(IV). There is only a limited 
set of heterotrimeric molecules folded into triple helical molecules with the 
stoichiometries [alpha 1(IV)]; alpha 2(IV), alpha 3(IV) alpha 4(IV) alpha 5(IV), 
and [alpha 5(IV)], alpha 6(IV), respectively. Other chain combinations appar- 
ently do not exist. Collagen IV molecules aggregate into networks by inter- 
actions between their N-terminal, triple helical domains, called 7S-domains, 
organizing 4 similar collagen IV molecules in an antiparallel fashion. From such 
knots formed by 7S-domains the long collagen IV-triple helices stretch out into 
all spatial directions. This flexibility is afforded by interruptions in the typical 
(Gly-X-Y),-sequences that, unlike in fibrillar collagens, occur abundantly in 
collagen IV molecules, including a site between the small 7S- and the long 
triple helices. Such interruptions create a point of flexibility in an otherwise 
stiffly rod-like molecule. At their C-terminus, non-collagenous NC1-domains 
interact head-to-head, creating in conjunction with the 7S-interactions large 
supramacromolecular aggregates resembling chicken-wire-like interwoven 
networks. In the interaction between collagen IV - molecules involving NC1- 
domains, heterotypic arrangements are possible. [alpha 1(IV)], alpha 2(IV)- 
trimers can interact with [alpha 5(IV)], alpha 6(IV) trimers by interactions 
between alphal- and alpha5-, as well as alpha 2- and alpha 6-NC1 domains. By 
contrast, alpha 3 (IV) alpha 4 (IV) alpha 6 (IV) interact through their NC1-do- 
mains to yield pairs of 2 alpha 4 (IV)-NC1-domains or alpha 3 (IV)-alpha 5 
(IV )-NC1-heterodimers. Thus, heterotypic networks arise with distinct 
supramolecular structures and propensities to undergo further aggregation 
reactions. Such chicken-wire-like networks undergo further supramolecular 
assembly by laterally aggregating into extended polygonal networks with widely 
variable mesh-sizes. 

Suprastructures formed by various combinations of collagen IV-isoforms 
can undergo further molecular interactions with basement membrane-associ- 
ated macromolecules. These include collagen VII-containing anchoring fibrils 
in the dermo-epidermal junction zone or collagen XVIII/endostatin-contain- 
ing filamentous suprastructures associated with basement membranes under- 
lying the retinal pigment epithelium or in Bruch's-membrane in the eye as well 
as in blood vascular endothelial basement membranes [18]. Mutations in colla- 
gens VII and XVIII destabilizing the interactions of these molecules with base- 
ment membrane components lead to dermal blistering diseases (dystrophic 

epidermolysis bullosa) or to Knobloch-syndrome, a rare disease characterized 

by severe ocular alterations, including vitreoretinal degeneration associated 
with retinal detachment and occipital scalp defect [19]. In each of these cases, 
the stability of basement membrane zones are destabilized by the mutations. 
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4.2 
Collagen VI Networks 


Collagen VI is an ubiquitous component of connective tissues. It is found as an 
extensive filamentous network with collagen fibrils and is often enriched in peri- 
cellular regions. It is assembled into several different tissue forms, including 
beaded microfibrils, hexagonal networks and broad banded structures [20, 21]. 
Collagen VI interacts with a spectrum of extracellular molecules including: 
collagen types I, II, IV, XIV, microfibril-associated glycoprotein (MAGP-1), 
pearlecan, decorin, biglycan, hyaluronan, heparin and fibronectin as well as inte- 
grins and the cell-surface proteoglycan NG2. Based on the tissue-localization and 
large number of potential interactions, collagen VI has been proposed to integrate 
different components of the extracellular matrix, including cells [22]. In addition, 
collagen VI may influence cell migration, differentiation and apoptosis/prolifer- 
ation. This indicates a role(s) in the development of tissue-specific extracellular 
matrices, repair processes and in the maintenance of tissue homeostasis. 

Collagen VI is a heterotrimer composed of alphal(VI), alpha2(VI) and 
alpha3(VI) chains [22, 23]. The type VI monomer has a 105 nm triple helical 
domain with flanking N- and C-terminal globular domains. The N-terminal 
domain is almost exclusively from the alpha3(VI) chain and has approximately 
twice the molecular mass of the C-terminal domain. The N- and C-terminal 
domains are composed of varying numbers of von Willebrand type A repeats. 
The alpha3(VI) N-terminal domain is larger than the comparable domains in 
the alphal(VI) and alpha2(VI) chains, with a maximum of ten type A repeats 
vs one. The alpha3(VI) C-terminal domain has five type A repeats vs two for the 
alphal(VI) and alpha2(VT) chains. The terminal type A repeat of the alpha3(VI) 
chain can be processed extracellularly. Structural heterogeneity is introduced by 
alternative splicing of domains, primarily of the alpha3(VI) N-terminal domain. 
This domain is expressed in several different forms giving rise to structural and 
functional heterogeneity. 

A distinct property of collagen VI is that assembly of the supramolecular 
forms begins in the lumen of intracellular compartments (Fig. 2). The initial 
assembly step is dimer formation via the lateral, anti-parallel association of two 
monomers. The monomers are staggered by 30 nm with the C-terminal domains 
interacting with the helical domains. This gives rise to an overlapped, central 
75 nm helical domain flanked by a non-overlapped region with the N- and 
C-globular domains, each about 30 nm. These interactions are stabilized by 
disulfide bonds near the ends of the overlapped region [24]. The overlapped 
helices of the two monomers form a supercoil in the central region [25]. In the 
next step, tetramers form when two dimers align with the ends in register. The 
second (C2) C-terminal type A repeat in the alpha2(VI) chain is critical for 
dimer and subsequent tetramer formation. This involves an interaction between 
the C2 region of one alpha2(VI) chain and the helical region of the anti-paral- 
lel alpha2(V1) chain [26]. Tetramers are secreted and are the building blocks that 
assemble extracellularly into the tissue forms of type VI collagen. 
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Fig. 2A, B Assembly of collagen VI suprastructures: A collagen VI forms tetramers intra- 
cellularly. The collagen VI monomer assembles from 3 alpha chains and has a C-terminal 
globular domain, a central triple helical domain and an N-terminal globular domain. The 
monomers assemble N-C to form dimers. Tetramers are assembled from two dimers aligned 
in register. The tetramers are secreted and form the building blocks of three different 
collagen VI suprastructures; B beaded filaments, broad banded fibrils and hexagonal 
lattices form via end-to-end interactions of tetramers and varying degrees of lateral asso- 
ciation. (Diagrams modified from [28, 30]) 


In the extracellular environment, collagen VI tetramers assemble to form the 
suprastructures found in tissues. Tetramers associate end-to-end forming 
beaded filaments. This is a non-covalent interaction that is presumably mediated 
through Type A domain interactions. This gives rise to thin, beaded filaments 
(3-10 nm) with a periodicity of approximately 100 nm. These beaded filaments 
laterally associate, forming beaded microfibrils [20, 27,28]. In addition to beaded 
microfibrils, other type VI collagen-containing supramolecular structures are 
found in the extracellular matrix including hexagonal lattices; and broad banded 
fibrils with a 100 nm periodicity. The hexagonal lattices are formed via end-to- 
end interactions of tetramers in a non-linear fashion. While the broad banded 
fibrils probably represent continued lateral growth of beaded microfibrils 
and/or lateral association of preformed beaded microfibrils (Fig. 2). 
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Supramolecular aggregates of collagen VI are composite structures. As dis- 
cussed above, collagen VI interacts with a large number of extracellular matrix 
molecules. Comparable to collagen fibrils, supramolecular aggregates con- 
taining collagen VI are composite structures with other integrated molecules 
modulating the functional properties of the type VI collagen-containing supras- 
tructure. For example, biglycan and decorin bind to the same triple helical site 
near the N-terminus [29]. This interaction is mediated by the core protein and 
the presence of the glycosaminoglycan chain(s) had no effect on binding. Bi- 
glycan interactions with the tetramer induced formation of hexagonal lattices 
rather than beaded microfibrils. This was dependent on the presence of the 
glycosaminoglycan chains. In contrast, decorin, which binds to the same site, was 
less effective in inducing hexagonal lattice formation [30]. Analogous to fibril 
formation, the interaction of small leucine-rich proteoglycans with collagen VI 
influences the structure of the tissue aggregate and therefore its function. In 
addition, this regulation involves two closely related, class I, leucine-rich 
proteoglycans that compete for the same binding site. In cartilage, the expres- 
sion of biglycan is enriched in the pericellular environment while decorin is 
enriched in the territorial matrix. This provides a mechanism to assemble 
different suprastructures in adjacent regions or tissues with different functions. 
Coordinate changes in expression patterns could influence matrix assembly 
during development and repair. In addition, abnormal changes in expression 
would alter the tissue-specific suprastructure and may lead to tissue pathology. 
In addition, collagen VI-associated decorin or biglycan can form complexes 
with matrilin-1. In the cartilage matrix, these complexes mediate interactions 
between the collagen VI network, the fibrillar network and the aggrecan net- 
work [31]. This illustrates another mechanism whereby the composite structure 
of collagen suprastructures contributes to define the specific structure/function 
associated with different tissues. 


4.3 
Collagen VIII and X Networks 


Collagens VIII and X assemble to form hexagonal networks in tissues. These 
collagens are closely related, with comparable gene and protein structures [1, 22, 
32]. Collagen VIII is a major component of blood vessels, located subendothe- 
lially and Descemet’s membrane, separating the corneal endothelium from 
stroma [33, 34]. Descemet’s membrane is composed of layers of hexagonal 
lattices [35]. These lattices are suprastructures containing collagen VIII [34]. 
Collagen X has a very restricted distribution, found only in hypertrophic car- 
tilage. The supramolecular form is a hexagonal lattice containing collagen X 
similar to that formed by collagen VIII [36]. Collagen VIII is a homo or het- 
ero-trimer of alphal(VIII) and alpha2(VIII) chains while collagen X is com- 
posed of a single alphal(X) chain. There is evidence that both collagen VIII 
homotrimers and the [alphal(VIIT)];alpha2(VIIT) heterotrimer exist in tissues 
[37, 38]. 
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Fig. 3A, B Assembly of collagen VIII hexagonal lattices: A the C-terminal, non-collagenous 
domains of four collagen VIII molecules interact to form tetrahedrons. Tetrahedrons 
assemble further to form hexagonal lattices; B a planar hexagonal lattice is diagrammed. 
Continued assembly, involving interactions of the amino terminal non-collagenous domains 
or anti-parallel interactions involving both helical and terminal domains would generate 
a layered hexagonal lattice. (Model modified from [39]) 
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Recently, collagen VIII lattices have been assembled in vitro [39]. These in 
vitro lattices are comparable to those seen in tissues and a model for assembly 
of the collagen VIII suprastructure was proposed (Fig. 3). Collagen VIII mole- 
cules form a tetrahedron through the interaction of four molecules. The inter- 
action is proposed to involve the carboxy non-collagenous domains that have 
a conserved hydrophobic patch [40]. This structure is proposed as the building 
block that assembles into three-dimensional hexagonal lattices. The assembly 
of a layered hexagonal lattice could involve interaction of the amino terminal 
non-collagenous domains or anti-parallel interactions involving both helical 
and terminal domains. The anti-parallel interactions are consistent with the 
thicker inter-nodal struts observed and it is predicted that this is the primary 
mechanism for formation of hexagonal lattices both in vitro and in tissues. 


5 
Assembly of Collagen Suprastructures in Tissues 


5.1 
Cartilage Fibril Formation 


D-periodically banded (D=64 nm) fibrils in cartilage [41] constitute the major 
fraction of the dry weight of the matrix in this tissue and are the main tensile 
element containing the swelling pressure generated by osmotic binding of 
water to the highly polyanionic glycosaminoglycan chains of the extrafibrillar 
matrix. Two major populations of fibrils exist in cartilage. There are thin fib- 
rils with a uniform diameter of about 20 nm that occur throughout the extra- 
cellular matrix of all hyaline cartilages. They are particularly enriched in the 
territorial matrix where they have a preferential orientation parallel to the 
surface of the chondrocytes. Thus, the fibrils embed and separate individual 
chondrocytes by forming basket-like structures [42]. The second population 
almost exclusively occurs in specialized matrix compartments, termed inter- 
territorial regions, and are more remote from the chondrocytes. In growth 
plates and in articular cartilage, their preferential orientation is parallel to the 
long axis of the bones and the direction of forces generated by load bearing. It 
is unclear by what mechanism the wider fibrils are formed, but it is probable 
that the thin territorial fibrils correspond to the prototypic cartilage fibrils that, 
after appropriate processing, can undergo fusion to form the larger interterri- 
torial fibrils. The other lateral growth mechanism, i.e., the direct apposition of 
collagens and other macromolecules to pre-existing thin fibrils, is less likely 
to operate since the cells producing the macromolecular fibril constituents 
are separated by large distances from the thick and well banded fibrils of the 
interterritorial matrix. This would necessitate extensive diffusion of fibril 
macromolecules through the dense network of cartilage matrix. 

Cartilage fibrils exquisitely illustrate the concept of matrix aggregates as 
macromolecular composites/alloy. Their quantitatively major component is the 
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fibril-forming collagen II, a protein that typically, but not exclusively, occurs in 
all types of cartilage. The designation of collagen II as a “fibril-forming” colla- 
gen is appropriate in the sense that the protein is an essential fibril component 
in cartilage. However, the protein, by itself, is incapable of forming fibrils of the 
extensive lengths required in the tissues. Instead, collagen II forms tactoidal 
structures with a strong D-banding pattern and with very limited lengths when 
the pure protein is subjected to aggregation in vitro. The tactoids essentially 
consist of two tapering ends joined together back-to-back. In addition, they are 
only formed at very high initial concentrations of monomeric collagen II and 
lack any obvious lateral growth control. It is most likely that, for these 
reasons, collagen II never occurs as a single protein in cartilage fibrils in situ. 
Rather, the protein always occurs in macromolecular composites. In the case of 
the prototypic, territorial fibrils, the collagenous components include collagens 
IL IX, and XI [43] or, more rarely, collagens II, XI, and XVI [44]. When subjected 
to fibrillogenesis in vitro, mixtures of collagens II and XI alone exhibit an out- 
standing capability of forming thin and uniform fibrils with a diameter of about 
20 nm and closely resembling cartilage fibrils in the electron microscope. In- 
terestingly, this tight diameter control is observed only when collagens II and XI 
are present at molar fractions fi;- [collagen II]/[collagen XI]<8, a proportion 
strikingly similar to that occurring in authentic prototypic fibrils. Thus, the 
mode of lateral packing in prototypic fibrils is jointly dictated by collagens II 
and XI forming a uniform mixture and, for this reason, can be likened to a 
metal alloy. However, the fibrils formed in vitro by collagens II and XI, alone, 
appear to be somewhat less tightly packed than prototypic fibrils. In addition, 
they are less stable in that the two collagens lose their aggregating capacity 
upon prolonged standing without demonstrable proteolytic alteration. More- 
over, this loss of competence for fibrillogenesis is readily rescued by the addition 
to the reconstitution mixtures of collagen IX. Thus, collagen IX is essential for 
the overall formation of cartilage fibrils with long-term stability rather than to 
be a "decorative" addition to aggregates preformed by the other two collagen 
types. Therefore, collagen IX is to be considered as the third collagenous com- 
ponent of the macromolecular alloy in cartilage fibrils and can be important 
during assembly or after assembly in a tissue-specific manner [45]. Such com- 
posite fibrils comprise a heterotypic fibril body encompassing parts of all three 
collagen types from which collagen IX can project to the exterior its aminoter- 
minal globular NC4-domain with the collagenous region Col3 serving as a 
spacer. The protein also may interconnect individual cartilage fibrils in the 
tissue [15, 46]. 

The macromolecular components of cartilage fibrils are unlikely to be 
restricted to collagens. When subjected to electrophoresis in polyacrylamide 
gels after extensive denaturing treatment (SDS, reducing agents, boiling for 
extended periods of time), prototypic fibrils generate diffuse electrophoretic 
patterns resembling those of proteoglycans. These smears arise from abundant 
and tightly bound fibril constituents that, unlike the collagens, are susceptible 
to degradation by certain exogenously added proteinases [16]. The identity of 
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these non-collagenous fibril components still is unknown. Conceivably, they 
include glycoproteins or proteoglycans such as the members of the family of 
proteins with leucine-rich repeat motifs. In fact, decorin is known to occur pref- 
erentially in the interterritorial zones of cartilage matrix that is rich in the 
largest banded fibrils lacking collagen IX as visualized by immunoelectron 
microscopy [47]. This led to the hypothesis that interterritorial banded fibrils 
arise from small prototypic collagen IX-containing fibrils after proteolytic 
elimination of collagen IX or, at least, its aminoterminal Col3- and NC4-domains 
studding the fibril surface. After accommodation of such processed prototypic 
fibrils into the D-periodic stagger, fusion is thought to occur accompanied by a 
polyanionic conditioning of the fibril surface by incorporation of the proteo- 
glycan decorin. This mechanism of lateral growth would eliminate the necessity 
of diffusion of procollagens and procollagen-processing proteinases over large 
expanses of dense fibrillar cartilage networks. However, it would also neces- 
sitate the selective presence of fibril-processing proteinases at the boundaries 
of territorial and interterritorial zones of cartilage matrix. A challenging task 
of the future will be the identification of the hypothetical proteinases and the 
elucidation of their regulated action. 


5.2 
Corneal Fibril Formation 


The mature corneal stroma is composed of a single, homogeneous population 
of small diameter collagen fibrils organized as orthogonal layers. This stroma 
provides for the mechanical stability of the anterior eye and for corneal trans- 
parency. Both of these properties are dependent on the supramolecular orga- 
nization of collagen into the composite fibrils characteristic of this extracellu- 
lar matrix. The corneal fibrils are alloys assembled from collagens I and V. 
These collagen I/V fibrils interact with FACIT collagens, types XII and/or XIV 
depending on developmental stage. In addition, the small leucine rich proteo- 
glycans, decorin, lumican, keratocan, and osteoglycine interact with the fibril sur- 
face (Fig. 4). These heteropolymeric fibrils provide a tissue-specific composite 
fibril that is responsible for the unique properties of the cornea. The formation 
of this supramolecular aggregate is a multistep assembly process involving: 
fibril initiation, initial assembly of a fibril intermediate, followed by linear fibril 
growth with a lack of lateral fibril growth. Fibril initiation and initial assem- 
bly of a fibril intermediate is dependent on the collagen-collagen interactions 
that produce the alloy, while the later fibril growth steps are dependent on 
fibril-associated macromolecules. 

Fibril assembly involves collagen-collagen interactions. The corneal kerato- 
cytes synthesize two fibril forming collagens. Collagen I is the quantitatively 
major collagen making up 80-90% of the total while the lalpha1(V )1, alpha2(V) 
isoform of collagen V is the quantitatively minor fibril-forming collagen [48, 49]. 
These two collagens co-assemble to form a heterotypic fibril [50, 51]. This co- 
assembly regulates the formation of the initial fibril intermediate. In addition, 
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Fig.4A,B Corneal fibril: A corneal fibrils are heterotypic,co-assembled from collagens I and 
V. Collagen V is a quantitatively minor component of most collagen I-containing fibrils. It has 
a retained N-terminal, non-collagenous domain that must be in/on the gap region/fibril 
surface. The heterotypic interaction is involved in efficient initiation of fibril assembly; B the 
heterotypic alloy forms the core of a composite fibril with fibril-associated leucine-rich repeat 
proteoglycans and FACIT collagens bound to the surface. While the heterotypic composition 
is relatively constant, the fibril-associated macromolecules are more dynamic, changing tem- 
porally during development or repair and spatially in different tissues/tissue domains 


corneal type I collagen differs from that found in other tissues in its elevated 
level of glycosylation [52]. This post-translational modification also affects 
regions of collagen I that are incorporated into the overlap zones of fibrils. The 
increase in diameter of individual collagen molecules introduced by the galac- 
tose and/or glucosyl-galactose moieties also is likely to affect fibrillar collagen 
organization [53]. Procollagen V is secreted and the C-propeptide is processed. 
However, unlike collagen I, the amino-terminal non-collagenous domain is 
only partially processed, with the cysteine-rich domain removed. An amino- 
terminal domain is retained containing a terminal tyrosine-rich, globular 
domain, a rigid rod-like domain and a flexible hinge region. This retained 
amino-terminal domain is responsible for most of the regulatory activity of the 
type V collagen molecule [54]. 

Collagens I and V collagen co-assemble so that the type V collagen triple 
helix is internalized within the fibril while the amino-terminal domain projects 
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through the gap region and is present on the fibril surface [49, 51]. The helical 
domain of collagen V is approximately 1096 longer than the collagen I domain 
and does not perfectly fit a quarter-stagger arrangement with type I collagen 
155, 561. Properties intrinsic to these collagen-collagen interactions regulate the 
initiation and assembly of the initial fibril (intermediate). In situ experiments 
with corneal keratocytes demonstrated that reduction in the type V content in- 
creased fibril diameter [57]. In vitro self assembly assays using purified native 
type I and V collagen demonstrated that increasing the type V collagen content 
was inversely associated with fibril diameter. Removal of the N-terminal do- 
main of type V collagen abolished most of this regulatory effect [49]. However, 
the co-polymerization of the type V triple helix with type I collagen retained 
a limited ability to influence fibril diameter. This effect may be related to the 
longer length of the type V triple helix relative to the type I helix and therefore 
less organized molecular packing. A strong relationship between regularity 
of molecular packing and larger aggregates has been shown in a number of sys- 
tems [58, 59]. Therefore, the less regular packing possible with the heterotypic 
mixture would be related to smaller diameter fibrils. However, the N-terminal 
domain was required for most of the regulation of diameter. Molecular assem- 
bly of fibrils would require that this non-collagenous domain be present on the 
fibril surface. The presence of this domain on the fibril surface could limit fib- 
ril diameter via steric or electrostatic mechanisms. However, the quantitatively 
minor collagen may exert this regulatory influence by controlling nucleation of 
fibril assembly. The manipulation of collagen I/V ratios using stratified cultures 
of human cells haplo-insufficient in collagen V also demonstrated an inverse 
relationship between type V collagen content and fibril diameter [60]. Halving 
the collagen V decreased the initiation events; with a constant pool of type I 
collagen the result was assembly of fewer, larger diameter fibrils. This indicates 
that the interaction of collagen V with collagen I is required for the efficient ini- 
tiation of fibril assembly. Therefore, the regulation of fibril diameter by colla- 
gen V is related to controlling the number of nucleation events. In the cornea, 
the high content of collagen V relative to other collagen I-containing extracel- 
lular matrices, i.e., 10-20% vs 1-296, provide for the initiation of large numbers 
of small diameter fibrils. These immature fibrils are short intermediates form- 
ing the building blocks for the mature fibrils. 

Collagen fibrils are initially formed as short fibril intermediates. Mature fib- 
rils form via linear and lateral growth of the preformed intermediates [49, 61]. 
In cornea, transparency is dependent on the maintenance of small diameter 
fibrils and the mechanical properties require an increase in fibril length. The 
leucine-rich repeat proteoglycans interact with the fibril surface and have been 
implicated in the regulation of the later stages in collagen fibril growth. When 
corneal fibrils were isolated from the corneal stroma, stripped of fibril-asso- 
ciated molecules, increases in both length and diameter were observed [49]. 
This indicates that presence of the type V collagen N-terminal domain does not 
regulate later steps in fibril assembly. Decorin, lumican, keratocan and osteo- 
glycine are all expressed in the corneal stroma and presumably all are bound 
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to fibril surfaces 114, 62-64]. This interaction of the proteoglycans with a fibril, 
forms a composite structure. This composite structure is important in the 
regulation of fibril growth, fibril packing and stromal hydration. 

The proteoglycans found on the fibril surface function in maintaining the 
small homogenous fibril diameters necessary for transparency. Lumican-, ker- 
atocan- and osteoglycine-deficient mouse corneas all show increases in corneal 
fibril diameter [63-65]. However, only lumican-deficient corneas have major 
alterations in fibril and stromal structure associated with corneal opacity. 
Lumican-deficient mouse corneas have large irregular fibrils present in the 
stroma [62, 65]. These fibrils are indicative of a loss of diameter control. Since 
corneal fibrils grow in length, but not diameter, this indicates that the presence 
of lumican, as part of the collagen suprastructure in the cornea, is a necessary 
stabilizer/inhibitor of lateral fibril growth. The restriction of the major defects 
to the posterior stroma where lumican expression also is restricted in the wild 
type adult suggests that there are other comparable interactions involved in the 
anterior stroma. This indicates that different composites are found with dif- 
ferent spatial distributions. 

Transparency also depends on the rigid maintenance of stromal hydration. 
The charged proteoglycans bind water and keratan sulfate proteoglycans have 
essential roles in the regulation of corneal hydration. Another feature of the 
stroma necessary for function is the regular packing of stromal fibrils into 
regular orthogonal lattices. The surface associated proteoglycans have been 
implicated in this level of organization as well. In addition, type XII collagen, 
a FACIT collagen, is homogeneously expressed throughout the mature corneal 
stroma. The large non-collagenous domains of the fibril-associated molecules 
also have been implicated in the regulation of fibril packing. 


5.3 
Tendon Fibril Formation 


In contrast to the cornea, the collagen fibrils in the tendon have significant 
increases in diameter during development and growth. The mature tendon 
contains uniaxial fibrils with a very heterogeneous population of different size 
fibrils. The mechanical properties of the tendon are dependent on the increases 
in fibril diameter seen with development. Tendon fibroblasts express collagen 
I as the quantitatively major fibril-forming collagen and minor amounts of 
collagens V and III. Both form heterotypic alloys with collagen I and have the 
retained/slowly processed N-terminal domains typical of the regulatory fibril- 
forming collagens. The nucleation of fibril assembly presumably proceeds as 
has been described in cornea, assembling relatively small diameter, short fib- 
ril intermediates. 

During tendon development there are changing expression patterns for 
FACIT collagens and the leucine-rich proteoglycans [66-68]. Collagen XIV is 
expressed during early development followed by little if any expression. In the 
mouse, both biglycan and lumican are expressed at their highest levels during 
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development and then decrease rapidly to unmeasurable amounts with matu- 
ration. In contrast, expression of both decorin and fibromodulin increases 
during maturation and then remains stable. These changing expression pat- 
terns are reflected in the changing nature of the composite collagen fibrils 
during this period. Tendon fibril structures are abnormalin each strain of mice 
deficient in one of the four proteoglycans, consistent with abnormal regulation 
of fibril growth in the absence of any of the four proteoglycans [14, 69, 70]. 
These abnormalities are associated with alterations in the biomechanical 
strength of the tendons [69, 71]. The changing composite structure is therefore 
critical to the regulation of the linear and lateral growth steps necessary for 
normal tendon structure and therefore function. 

The involvement of cellular structures in the deposition of tendon fibrils 
also has been the subject of several major papers [61]. Earlier investigations 
indicated that, the formation of primordial fibrils with a homogeneous 
diameter of ca. 30 nm occurred within deep recesses of tendon fibroblasts 
[72, 73]. This was confirmed in a recent report where intracellular processing 
of procollagen within elongated Golgi-to-plasma membrane compartments 
(GPCs) was demonstrated. Extrusion from the cells and deposition into pre-ex- 
isting hexagonal arrays occurred through the formation of cellular protrusions, 
termed “fibripositors”, that were formed by fusion of GPCs containing single 
or several small-diameter fibrils with the plasma membrane. Thus, a contigu- 
ity between novel organelles achieving fibril parallelism and extracellular fib- 
ril bundles was established. Fibripositors were prominent only during fetal 
development when fibril formation in tendons was at its peak, but were not 
observed during adolescence when tendon growth is the predominant event 
[74]. However, in analogy to sickling of erythrocytes containing deoxygenated 
hemoglobin S-fibers, generation of fibripositors may be a consequence of fib- 
ril self-assembly initiated intracellularly rather than a truly instructive process 
initiated by cytoskeletal structures and aligning parallel fibril bundles. In 
addition, the mechanism is not known whereby the primordial 30-nm fibrils 
being hexagonal cylindrical objects themselves are kept from fusing into large 
fibril bundles within the hexagonal packing into which they are arranged 
extracellularly. Since decorin-deficiency leads to abnormalities in fibrillar 
packing, it may well be that this or other small leucine-rich proteoglycans sep- 
arate primordial fibrils from each other and tether them into the hexagonal ar- 
rays seen in channels between fetal tendon fibroblasts. 


6 
Concluding Remarks 


There are at least 27 different collagens that can be grouped into subfamilies 
including: fibrillar collagens, FACIT collagens, and network-forming collagens 
that were the focus of this review. This diversity, alone, would generate nu- 
merous different collagen suprastructures. Increased complexity is obtained at 
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the level of alternative splicing and promoter use as well as post-translational 
modifications. In addition, the heteropolymeric nature of collagen supra- 
structures provides a mechanism for even greater diversity. The tissue-, 
domain- and developmental-specificity exhibited at all levels provides an 
extraordinary diversity in the building blocks of the extracellular matrix. 
A major challenge will be to elucidate the progressive changes in collagen 
suprastructures necessary during normal development as well as tissue repair 
and regeneration. A further understanding of the basis of connective tissue 
diseases also will require a definition of the interactions involved in assembly 
of tissue-specific suprastructures. 
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Abstract Collagen is the main source of extracellular support for multicellular animals. 
The mechanical strength of collagen fibrils depends on a highly regulated mechanism of 
intermolecular cross-linking. The basis of this cross-linking from the most primitive to the 
most advanced multicellular animals and across a diversity of vertebrate tissue types, is the 
formation of covalent bonds from aldehydes produced from lysyl and hydroxylysyl side- 
chains by lysyl oxidase. In the last decade it has become clear that such bonds form not only 
between collagen molecules of the same type in homopolymeric fibrils but also between dif- 
ferent types of collagen molecule that have evolved to interact and form heteromeric struc- 
tures. Furthermore, cross-linking amino acids and peptides containing them from collagen 
degradation, have received attention as bone resorption biomarkers in clinical studies and 
drug trials in the osteoporosis field. This review summarizes recent research directions with 
examples of advances in understanding complex interactions in cartilage collagen and the 
role of lysyl hydroxylase isoforms in regulating the pathway of cross-linking chemistry. 


Keywords Extracellular matrix - Lysyl oxidase - Lysyl hydroxylases - Pyridinolines - Bone 
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1 
Introduction 


Collagen fibrils provide the mechanical support that enabled large multicellular 
animals to evolve on earth [1, 2]. The tensile strength of collagen depends on 
the formation of covalent intermolecular cross-links between the individual 
protein subunits [3]. All the fibril-forming collagen types in higher vertebrates 
(types I, IL III, V and XI) are cross-linked through a mechanism based on the 
reactions of aldehydes generated enzymically from lysine (or hydroxylysine) 
side-chains by lysyl oxidase [4, 5]. This pathway operates in the collagen 
fibrils of sponges (Porifera), the most primitive extant multicellular animals, 
through to mammals [6, 7]. Certain other collagen types (e.g., collagen type IX 
of cartilage) are also cross-linked by the lysyl oxidase mechanism. In the last 
decade, perhaps the most important conceptual advance in this field is that dif- 
ferent collagen types have evolved to cross-link heterotypically in the assem- 
bly of multi-component fibrils. Templates of one class of collagen (type V/XI 
microfibrils) provide the scaffold on which the more recently evolved fibrillar 
collagen molecules (types I and II) co-assemble to form large fibrils. In cartilage, 
a third type of collagen, type IX (of the FACIT sub-family), becomes cross-linked 
to the surface of this copolymer and all three molecular types are cross-linked 
internally and to each other through the lysyl oxidase mechanism. In consider- 
ing the evolution of the different molecular classes of collagen, it is clear that 
gain of function mutations in the protein subunits have created new cross-link- 
ing opportunities for heterotypic intermolecular bonding. This presumably 
extends the range of collagen functional properties cells can express. 

In addition to physiologically regulated cross-linking on synthesis by lysyl 
oxidase, collagens are also susceptible over time to further cross-linking 
through the undesirable reactions of reducing sugars [8-13], particularly glu- 
cose (non-enzymic glycosylation or glycation) and lipid oxidation products 
[14, 15]. There is an extensive and growing literature on the chemistry of such 
age-related changes, which in general have pathological effects, but this chem- 
istry will not be reviewed here. 


2 
Fibril-Forming Collagens (Types I, II, II, V/XI) 


Most of the research defining the cross-linking pathways from precursor lysine 
and hydroxylysine aldehydes in bulk collagen fibrils of major connective tissues 
was done over 20 years ago (reviewed in [4, 16-19]). The basic pathways are 
shown in Figs. 1 and 2. In the last decade the prominence of pyrrole cross-links 
as maturation products in bone collagen and their molecular location have 
been established [20]. 

The two basic pathways, lysine aldehyde vs hydroxylysine aldehyde-initiated, 
appear in general to occur in loose vs stiff connective tissues respectively. In 
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Precursors 
Telopeptide Helix Telopeptide 
Lysine Hydroxylysine Hydroxylysine Helix Lysine 
1 
(CHa), 
NH, 


Telopeptide 
Hydroxyallysine 


Divalent (initial) 
Cross-links 


Hydroxylysino Hydroxylysino- Lysino 
\ norleucine Ketonorleucine Ketonorleucine 


+ Lysino- 
ketonorleucine 


Trivalent (mature) 
Cross-links 


Lysyl Pyrrole 


Fig.1 The hydroxyallysine cross-linking pathway. Hydroxylysine residues are the source of 
aldehydes formed by lysyl oxidase for intermolecular cross-linking reactions. Mature cross- 
links are trivalent pyridinolines. Bone collagen is unusual in that pyrrole cross-links are also 


prominent because both hydroxyallysine and allysine participate as precursors 
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Precursors 
Helix Telopeptide Helix Helix 
Hydroxylysine Lysine Hydroxylysine Histidine 
(CHa), 
NH; 
Lysyl Oxidase 
Products 
Telopeptide 
Allysine 
ü 
(CH2s 
CHO 
Divalent (initial) 
Cross-links 
Hydroxylysino Intramolecular Hydroxylysino 
5 norleucine dimers \ norleucine 
Trivalent (mature) 
Cross-links 


Histidinyl 
Hydroxylysino - norleucine 


Fig.2 The allysine cross-linking pathway. Lysine residues are the source of aldehydes formed 
by lysyl oxidase for intermolecular cross-linking reactions. Histidine can participate in 
mature cross-link formation, notably in skin collagen 
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detail, the tissue specificities and the pathways are more diversified, with 
elements of both pathways and both precursor aldehydes combined in some 
specialized tissues, for example in bone type I collagen [20]. Historically, the 
lysine aldehyde pathway (Fig. 2) was easier to study since the initially-formed 
aldimines are cleaved at low pH allowing the collagen monomers to be solubi- 
lized from rat skin and tail tendon in 0.5 M acetic acid [19]. On the hydroxyly- 
sine aldehyde pathway (Fig. 1), neither the initial cross-links nor their matu- 
ration products were labile so these collagens were insoluble making molecular 
analysis more difficult. The natural fluorescence of the pyridinoline cross- 
links (Fig. 1) allowed peptides to be isolated and their molecular sites of ori- 
gin to be worked out [21-23]. In the last decade the greatest attention in the lit- 
erature to pyridinoline residues, and to collagen cross-links in general, is from 
publications reporting assays of these residues and peptides containing them 
in blood and urine as biomarkers of bone resorption and connective tissue 
degradation. Because pyridinolines cannot be metabolized, their levels in blood 
and urine provide a measure of the amount of collagen and hence tissue from 
which they were proteolytically derived. 

Variations in cross-linking chemistry appear to be more tissue-specific than 
collagen type-specific. This is understandable if a single cell type synthesizing 
multiple collagens passes them through the same processing enzymes and 
endoplasmic reticular pathway. The basic pathway of cross-linking is regulated 
primarily by the hydroxylation pattern of telopeptide and triple-helix domain 
lysine residues. The flanking sequences around the cross-linking lysine residues, 
however, can affect the ensuing chemistry. An example is the participation of a 
histidine residue in the formation of the mature trivalent cross-links found in 
skin type I collagen (Fig. 2). A histidine in the «2(1) chain (of a third collagen 
molecule) reacts with a vicinal aldimine cross-link formed between a lysine 
aldehyde and a hydroxylysine residue in two 4D-staggered collagen I molecules 
[24, 25]. The pitch of collagen molecules packed in skin collagen fibrils is 
thought to facilitate this addition reaction to histidine [26], but whether it is 
fortuitous or provides a functional advantage to dermal collagen is unknown. 
Older studies had observed that the helical domain hydroxylysine at this loca- 
tion in the ox1(1) chain was glycosylated in skin, but not in tendon [27]. Notably, 
the histidine-containing mature cross-link HHL (histidinyl hydroxylysino-nor- 
leucine), is not found in tendon collagen, which raises the possibility that 
glycosylation might drive HHL formation. It has been observed that pyridino- 
line residues (hydroxylysyl pyridinoline) can still be formed when glycosylated 
hydroxylysine occurs in the helix, for example at the C-telopeptide to N-helix 
site in bone collagen [20]. But at the N-telopeptide to C-helix cross-linking site, 
the helical hydroxylysine is not glycosylated. Again the biological significance 
of these differences is unknown. 

Edman N-terminal sequence analysis of cross-linked peptides isolated from 
protease digests of tissue collagens has revealed intermolecular cross-linking 
between types I and III collagens (e.g., from aorta [28, 29]) and types I and II 
collagens (e.g., from intervertebral disc [30]). Unequivocally proving covalent 
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structures rather than peptide mixtures is difficult using chromatography and 
N-terminal sequence analysis alone. Mass spectrometry promises to be a valu- 
able tool in such studies. Preliminary data from purified cross-linked trimeric 
peptides indicate characteristic MS/MS fragmentation patterns. Figure 3 shows 
an example of MS/MS data from tandem mass spectrometry of a homotypic 
fragment from human cartilage type II collagen. 

Types V and XI collagens are quantitatively minor molecular species that are 
found copolymerized with collagens I and II respectively, and are believed to 
form a filamentous scaffold or template on which the bulk fibrillar collagens 
co-polymerize. The gene products represented in these molecules can in fact 
assemble in a variety of heterotrimeric chain combinations, not just type V 
([a1(V)],02(V)) and type XI ([a1(XD] [a2(XI)] [a3(XD)]) molecules, but also 
novel tissue-specific chain associations [31-33]. Collagen type V/XI is best 
considered as a distinct sub-family of the fibril-forming collagen molecules. 
The same basic pattern of cross-linking, employing lysyl oxidase, and the 
telopeptide-to-helix location of intermolecular bonds occurs in these mole- 
cules. Homologous cross-linking sites (lysines) to those in types I, II and III 
collagens are evident in the protein sequences (Fig. 4), consistent with their 
evolutionary origin from a single founder gene [2]. 

Analyses of cross-linked peptides isolated from type XI collagen of cartilage 
showed only divalent cross-links (hydroxylysino-5-keto-norleucine), not the 
mature form of pyridinoline (hydroxylysyl pyridinoline) that predominates in 
type II collagen, the bulk fibril-forming subunit of the copolymer [34]. In addi- 
tion, analysis of cross-linked peptides showed that most of the intermolecular 
bonds had formed between collagen XI molecules (N-telopeptide to C-helix). 
Any inter-type cross-linking was between type II C-telopeptides and the type XI 
N-helical site. This is best interpreted if collagen XI had self-polymerized to form 
its own filamentous network, at least initially, consistent with its suspected role 
as a template for the growth of thick banded fibrils. Similar properties have been 


Type Chain  N-telo N-helix C-helix C-telo 

I o.1(1) YDEKSTGGI GMKGHR GIKGHR PPQEKAHDG 
o2(T) YDGKGVGLG GFKGIR GLKGHN no lysine 

Il al (II) FDEKAGGAQ GVKGHR GLKGHR GPREKGPDP 

IH al (IIT) YDVKSGVAV GMKGHR GIKGHR IGGEKAGGF 

V/XI o 1(V) AGSKGPMVS GEKGHR GEKGHP no lysine 
o2(V) LDEKSGLGS GLKGHR GQKGHR no lysine 
o3(V) GSFKGPPVS GEKGOR GEKGHI no lysine 
al(XI) DGSKGPTIS GDKGHR GEKGHP PILSSKKTRR! 
o2(XI) GGDKGPVVA GEKGHR GEKGHP PIQMPKKTRR! 


1. Lysines appear not to be involved in cross-linking 


Fig.4 Amino acid sequences at the four primary cross-linking sites in fibrillar collagens. 
Related sequence motifs can be found in all the fibrillar collagen gene products 
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found for type V collagen isolated from bone [35]. The primary cross-links were 
divalent (hydroxylysino-5-keto-norleucine) and they linked type V molecules 
between N-telopeptides and the C-helix site in a head-to-tail manner. Cross-links 
from the C-telopeptide of type I collagen to the N-helix site of type V were also 
found. The data on mature bone fit a special role for a hybrid type V/XI collagen 
molecule, [a1(V)a1(XI)o2(V)] [32], as a component of the filamentous template 
of the type I collagen-based fibrillar network of bone matrix. 


3 
Fibril-Associated Type IX Collagen of Cartilage 


In addition to the fibril-forming collagen molecules (and basement membrane 
type IV collagen), 30 or more other molecular forms of collagen function in the 
extracellular matrix or at cell surfaces. Of these, only type IX collagen has been 
established to use the lysyl oxidase-mediated pathway of cross-linking. Within 
the diverse FACIT sub-family of collagen molecules (fibril-associated collagen 
with interrupted triple-helix [36]) therefore, collagen IX is the only member 
known to cross-link by the lysyl oxidase mechanism. Collagen IX cross-linking 
is extensive, highly evolved and presumably central to the protein's function as 
an adapter molecule on the surface of nascent type II collagen fibrils. All three 
collagen IX chains, which in birds and mammals are each the product of a 
distinct gene, appear to have diverged from a common ancestor after earlier 
whole or partial genome duplications. Each chain has two or three sites through 
which lysine-mediated cross-links can form and each displays unique speci- 
ficity in its evolved cross-linking interactions. Through a series of studies over 
the last decade, we have defined the cross-linking sites and nature of the cross- 
linking of collagen type IX [37-41] (Fig. 5). The experimental approach has 
focused on isolating and structurally identifying peptides containing cross- 
linking residues (using fluorescence to track pyridinolines, and NaB?H, to 
stabilize and tritium-label divalent cross-links). Conventional Edman-sequenc- 
ing and, more recently, liquid chromatography/electrospray mass spectrometry 
(LCMS [41]) have been the methods of choice. LCMS is clearly a powerful tool 
for the future for defining complex cross-linking mechanisms in collagens and 
other extracellular matrix proteins. 

Collagen IX has evolved to cross-link to collagen type II. The positioning of 
cross-linking sites along the molecule predicts their spatial inter-relationship. 
Figure 6 shows a molecular interaction model that can accommodate all six 
known cross-linking sites, and their interaction partner residues in type II col- 
lagen and other type IX collagen molecules. This packing arrangement requires 
an antiparallel relationship between the central type IX COL2 triple-helical do- 
main and type II collagen molecules on a fibril surface and the type IX COLI 
domain to be folded back on the COL2 domain as shown. All the known bonds 
[22, 37,38, 41] can then be accommodated with the correct axial spatial align- 
ments for cross-link formation. 
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SEA 
o1(IX) NCT GORAFNKGPDP 
19 


CS 


SES 
a3(IX) GAPGEKGPNGLP 
150 


12 15 COL2 or (II) 930 
ol (IX) GLPGMRGHKGAK | | 
a2(X) GMKGPPGLOGVKGHAGKR | |9S(IX) NC1 LGGVGEKSGSRSS | | a2(IX) NC1 LTEPGSIKGP 


26 
a3(IX) GMPGFKGPTGYK 
N-telo or @3(IX) NC1 


Fig.5 Cross-linking sites in type IX collagen. Seven sites (hydroxylysine or lysine residues) 
in total have been identified, two each in a1(IX) and a2(IX) and three in a3(IX) 


Clearly it will be important to define the molecular interactions that regu- 
late this complex cross-linking cascade, which also includes the participation 
of collagen type XI, acting as a template filament within the composite fibril. 
Are monomers of collagen IX transported extracellularly before they interact 
with nascent (thin) collagen II/XI co-assemblies, or is a discrete element of the 
collagen II/IX/XI heteromer pre-fabricated inside the cell in a secretory com- 
partment, which can polymerize extracellularly with the addition of type II col- 
lagen monomers? It is known that collagens IX and XI are most concentrated 
(with their highest ratios to collagen type II) in developing cartilage and in the 
immediate pericellular zone of chondrocytes [42-44]. As cartilage matures, the 
ratios of collagen II to collagens XI and IX rise to about 96:3:1 from 80:10:10 in 
fetal cartilage [39]. 

What is the function of collagen IX? Why are covalent bonds needed to 
anchor it to the surface of collagen II fibrils and to link adjacent collagen IX 
molecules? A mechanical role in strengthening the fibrillar matrix is reasonable 
to suspect. The interaction scheme shown in Fig. 6 suggests how interfibrillar 
cross-links might also form and so add to network stability, but the existence 
of such bonding will be hard to rule in or out. It does appear that the presence 
of collagen IX is not crucial for skeletal growth, since mice engineered with ho- 
mozygous null genes for COL9A1, in which type IX collagen is functionally 
knocked out [45], appear normal at birth [45, 46]. They do, however, develop os- 
teoarthritis with progressive destruction of joint cartilages [46], as do mice and 
humans expressing mutated forms of collagen IX genes [47, 48]. Collagen IX 
therefore may be required for mechanically durable mature cartilages. Perhaps 
the three-dimensional meshwork of developing fibrils is locked as a template 
outside the cell through the covalent interactions of collagen IX. Without a 
covalently-stabilized template, the mature collagen network may be more sus- 
ceptible to proteolysis, detrimental effects of mechanical loads and less able to 
endure repetitive injury and repair cycles. 
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Fig. 6 A proposed model of interaction of type IX collagen with type II collagen that can 
accommodate all the cross-linking sites shown in Fig. 5. Folding back of the type IX COL1 
domain on COL2 and an antiparallel alignment of the molecular on the surface of the type II 
collagen polymer can accommodate all the cross-linking interactions in the type IX collagen 
molecule. Potential inter-fibrillar bonds are also illustrated. The type II collagen fibril surface 
is shown acting as a template to bind and position adjacent type IX collagen molecules to 
cross-link to each other 


Collagen Cross-Links 217 


The complex structure of the cartilage heterofibril network presents a chal- 
lenge for understanding how the various enzyme-controlled steps in fibril 
growth are orchestrated. Lysyl oxidase must continue to create aldehydes on 
multiple sites on the interacting structural molecules without interfering 
spatially with the addition of new subunits or the propeptidases that remove 
the globular propeptides from II and XI procollagen molecules. The procession 
of protein interactions driving these and other activities on the fibril surface 
will be important to understand. 

Inspection of the gene sequences for the other FACIT family members 
reveals no obvious homologies to suggest candidate cross-linking lysines in 
domains comparable to those in type IX collagen. One possible exception is 
COL21AI, which has a candidate lysine in its C-terminal NC1 domain, which 
also is short as in the three type IX collagen chains [49]. Little is yet known 
about the tissue distribution and properties of collagen XXI, for example 
whether it binds to collagen fibrils and so could act in a similar manner to 
collagen IX. It does not appear to be present in cartilage. 


4 
Basement Membrane Type IV Collagen 


Basement membrane collagens are ancient (5500 million years 150, 511), having 
evolved in primitive metazoa as early or earlier than the fibril-forming colla- 
gens. Hydra, a simple organism formed from two cell layers that secrete and 
sandwich the mesoglea, an extracellular layer, has been shown to express genes 
for a basement membrane collagen that is homologous in sequence to vertebrate 
type IV collagen and for a fibril-forming collagen [52, 53]. 

The open molecular networks that collagen type IV molecules form [54,55] 
provide the framework for an assortment of proteins and proteoglycans that 
characterize the basement laminae that underlie most endothelial and epithe- 
lial cell layers. Early work indicated that vertebrate type IV collagen molecules 
are cross-linked covalently by disulfide bonds and bonds derived through the 
aldehyde-initiated lysyl oxidase mechanism [56]. Lysyl oxidase-mediated cross- 
links and disulfides were found in the 7S domain, a cross-linked tetramer of 
N-terminal globular domains, and also in preparations of the dimeric C-termi- 
nal NC1 domains. Recent work has concluded that the apparent non-reducible 
cross-link (suggesting a lysyl oxidase product) in the NC-1 dimer, was in fact 
a stable disulfide bond that required a high concentration of mercaptoethanol 
to break [57]. On the other hand, a high-resolution X-ray crystallographic 
study has indicated a strong spatial association between a methionine and a 
lysine side-chain in this domain [58], suggesting a novel covalent intermole- 
cular bond although the chemical nature of the link was not determined. 
Whether lysyl oxidase-mediated cross-links ever occur in type IV collagen still 
seems to be an open question. Both divalent cross-links (hydroxylysino-5- 
keto-norleucine; [56]) and evidence for pyrroles [59] have earlier been noted, 
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but the low yield of divalent cross-links implied that the main cross-links 
remained unidentified. 

An alternative explanation is that the low levels of known collagen cross- 
links had come from other collagen contaminants rather than from type IV col- 
lagen itself, and that in fact type IV collagen networks are not cross-linked by 
the lysyl oxidase mechanism. To resolve this question, more definitive data on 
cross-linked purified peptides are needed. On current evidence it seems that 
cystine disulfides and strong hydrophobic interactions may explain the cross- 
linking properties of typeIV collagen [55]. The collagen type IV gene product 
of Hydra vulgaris lacks a 75 domain in its sequence by comparison with the six 
vertebrate collagen IV genes, and the stable non-reducible cross-links recently 
noted in its NC1 domain [52] could be an unusually stable disulfide, as recently 
concluded for this domain from vertebrate type IV collagen [57]. The highly 
conserved cysteine distribution patterns in the typeIV collagens of Hydra and 
vertebrates support this possibility [52]. 


5 
Lysyl Hydroxylases Regulate Tissue-Dependent Patterns of Cross-Linking 


From the structures of the lysyl oxidase-mediated cross-links and tissue- 
specific differences, it has long been suspected that one or more telopeptide 
lysyl hydroxylases must exist to regulate the different cross-linking pathways. 
Only recently has direct evidence for such a gene product been found. Bank 
and colleagues discovered that the defect in Brück Syndrome, a heritable 
disorder of bone resembling osteogenesis imperfecta, eliminated pyridino- 
lines and other hydroxylysine aldehyde-based cross-links from bone collagen 
[60], implying a defect in the putative telopeptide hydroxylase. In the first 
family studied, it turned out that the chromosomal locus linked to disease 
expression was not the hydroxylase, and the effect was probably through an 
associated gene product. This was revealed later in a study of two other 
families expressing the Brück Syndrome phenotype and the same abnormal 
pattern of bone collagen cross-linking, in which disease-causing mutations 
were identified in PLOD2 (procollagen-lysine 2-oxoglutarate 5-dioxygenase, 
also known as lysyl hydroxylase 2 or LH2) [61], one of the three human genes 
that encode lysyl hydroxylase isoforms [62-65]. PLOD2 can be expressed in 
two alternative splicing variants, LH2a and LH2b, where LH2b contains the 
product of an extra exon [66]. Studies on skin fibroblasts from patients with 
systemic sclerosis, which features skin progressively fibrotic in which the 
collagen had higher than normal levels of hydroxylysine-aldehyde cross-links, 
showed overexpression of LH2b, which is concluded to be the telopeptide 
hydroxylase [61]. 

The molecular basis of another genetic disease, Ehlers-Danlos Syndrome 
type VI (EDS-VIA) caused by mutations in PLOD1 (LH1) [67-69], have helped 
in understanding how the chemical quality of collagen cross-linking is con- 
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trolled in vivo. In EDS-VIA, active PLODI expression is eliminated and in bone 
collagen, HP (hydroxylysyl pyridinoline), is replaced by LP (lysyl pyridinoline), 
which means that the specific triple-helical cross-linking lysines that donate 
the ring-nitrogen side-chain of pyridinolines are underhydroxylated [70, 71]. 
In addition, skin type I collagen of EDS-VI patients essentially lacks any hy- 
droxylysine residues, whereas bone type I collagen has about 5096 of normal 
hydroxylysine. EDS-VIA cartilage type II collagen has 90% of the hydroxyly- 
sine content of normal cartilage but its HP/LP ratio is abnormally low [71]. 
Together these findings reveal tissue-specific and molecular site-specific dif- 
ferences in the relative contributions of LH1, LH2 and LH3 to triple-helical do- 
main lysine hydroxylation. Also, the product of PLOD1 (LH1) is a helical lysyl 
hydroxylase that favors lysine residues as substrates at the two triple-helical 
sites of cross-linking in fibrillar collagens. In considering how these hydrox- 
ylases may regulate cross-linking, another property that may be important is 
the apparently additional activities of LH3 as both a galactosyl transferase and 
glucosyl transferase for collagen [72-74]. Since glycosylated hydroxylysines at 
cross-linking sites can participate in cross-linking, this may turn out to be 
another regulatory step. Notably in EDS VIA in skin collagen lacking any 
hydroxylysine, the cross-link later identified as HHL (Fig. 2) was missing 
(Eyre, unpublished). This suggests that lysine cannot substitute for glycosy- 
lated hydroxylysine (which is the helix donor residue) and form the lysine 
homologue (see Fig. 2). 

In bone collagen the lysyl hydroxylase-mediated control mechanisms for 
cross-linking must be especially fine-tuned. Each triple-helical domain lysine and 
telopeptide lysine that goes on to form cross-links is partially hydroxylated in a 
site-specific manner. This produces the characteristic ratio of HP/LP and of py- 
ridinolines to pyrroles that typify bone collagen. Presumably, expression levels of 
lysyl hydroxylase 1, 2 and 3 by osteoblasts and the local sequence context of the 
individual chains in which the cross-linking lysines occur, dictate the pattern. 
Analysis of the eventual cross-link composition of peptides from each locus can 
be used to estimate the original degrees of lysine hydroxylation [20]. Figure 7 
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Fig.7 A pattern of partial hydroxylation of cross-linking lysine residues in bone collagen 
regulates this tissue's distinctive cross-linking chemistry. The cross-link properties and 
yields of cross-linked peptides from the four primary loci are the source for the degrees of 
hydroxylation indicated [20] 
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summarizes the information from such analyses of normal human bone colla- 
gen. How this affects or relates to the lateral organization of molecules in bone 
collagen fibrils and the process of mineral crystallite deposition is still not clear, 
though suspected to be fundamental to bone properties [75]. In a study of LH1, 
LH2 and LH3 in the rat, most tissues expressed mRNAs for all three enzymes, 
implying a lack of tissue specificity in lysyl hydroxylase function [76]. However, 
collagen site-specific hydroxylation is probably regulated at more than one level 
of gene and protein expression. 


6 
Bone and Mineralized Tissue Collagens 


A surge of interest in the last decade in collagen cross-links as clinical bio- 
markers of bone turnover has drawn attention to the cross-linking properties 
of bone collagen. It has long been noted that the cross-linking of collagen in 
bone, and other tissues that mineralize (dentin, calcified tendons) is distinctive, 
and probably functionally related to the intimate relationship between mineral 
crystallites and the packing arrangement of collagen molecules in fibrils [77, 78]. 
The mechanism, using both lysine aldehydes and hydroxylysine aldehydes, is 
distinctive and produces roughly equal amounts of pyridinolines and pyrroles 
as the mature cross-linking residues (see Fig. 1). Each type of cross-link is 
distributed site-specifically. Pyrroles, for example, are concentrated at the 
N-telopeptide-to-a2(I) chain C-helix [20]. Lysyl pyridinoline is also more 
abundant in general at the N-telopeptide-to-C-helix locus. 

The structural quality of the trabecular architecture of human cancellous 
bone was found to be linked to the ratio of pyrrole to pyridinoline cross-links 
[79]. Also, a change in the cross-linking pattern along turkey tendons just be- 
fore they mineralize suggests a causative relationship between cross-linking 
quality and mineralization [80]. In a fracture repair model, the ratio of HP/LP 
and hydroxylysine content of the callus collagen was strongly related to the 
degree of fibril mineralization [81]. Methods for detecting cross-linking 
residues spectroscopically in sections of bone tissue using FTIR show promise 
in exploring this further [82-84]. Pyrrole cross-links, which are not stable to 
acid hydrolysis, are difficult to quantify and characterize. A method that uses 
biotinylated Ehrlich's reagent to derivatize the residues prior to isolation has 
been reported [85]. Lysyl pyridinoline, a recognized biomarker of bone collagen, 
has now been synthesized as a standard and reagent source for developing bio- 
marker immunoassays [86]. 

A further form of pyrrole cross-link, a trivalent pyrroleninone, has been 
tentatively identified in an acid hydrolysate of dentin collagen (Fig. 8) [87], in 
addition to the pyridinolines, pyrroles and ketoamines already described. 

In another study using an antibody that recognizes the C-telopeptide of the 
a1(I) chain, both cross-linked and uncross-linked forms of this domain were 
isolated from trypsin-digested human bone collagen [88]. Of the trivalent 
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Fig.8 A novel pyrroleninone cross-link isolated from bovine dentin in addition to pyridi- 
nolines, pyrroles and divalent keto-amines 


cross-linked structures, a significant fraction could not be explained by pyrroles 
or pyridinolines, implying an unidentified cross-linking residue. 

Higher levels of pyridinoline cross-links and hydroxylysine were found in 
bone from patients with osteogenesis imperfecta (clinical types I, II and III) 
compared with normal bone [89]. Although this suggests no gross disturbance 
in molecular packing of collagen molecules in fibrils, the potential that colla- 
gen/mineral inter-relationships are subtly disturbed deserves more attention as 
a source of the bone fragility. 


7 
Collagen Cross-Links as Biomarkers 


Results from searching the literature on collagen cross-linking are heavily 
weighted over the last decade with reports from clinical studies that measured 
pyridinolines or cross-linked telopeptides in body fluids as molecular markers 
of bone turnover (reviewed in [90-94]). These cross-linking amino acids, and 
peptides containing them, are found in blood as products of collagen proteo- 
lysis which survive into urine. Pyridinolines are quantitatively excreted in the 
form of the free amino acids and small peptides, there being no degradation 
pathway in the liver. Initial reports showed that the total pool of pyridinolines 
(HP plus LP) in urine, quantified after acid hydrolysis by HPLC, can provide a 
more specific index of systemic bone resorption than hydroxyproline, a long- 
used marker [95-99]. Even more specific immunoassays were then introduced 
that targeted short telopeptide fragments attached to the cross-linking residues 
using specific antibodies [100-102]. It was discovered that peptide degradation 
products of the two cross-linked domains of bone collagen (N-telopeptide-to- 
helix or NTx, and C-telopeptide-to-helix or CTx) surviving into blood and urine, 
fell into discrete chromatographic pools of low molecular weight (<2 kDa) 
[100-103]. Antibodies could be tailored that recognized the core peptide com- 
ponents as neoepitopes. Immunoassays were developed that are specific to type I 
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collagen (peptide sequence specificity) and to cleavage products peculiar to the 
cathepsin K-initiated pathway of proteolysis which osteoclasts use to degrade 
bone collagen [104, 105]. 

Figure 9 shows an NTx peptide structure recovered from urine. Using in-line 
microbore reverse-phase HPLC and electrospray mass spectrometry, two stereo- 
isomers of the peptide can be resolved, which differ in MS/MS fragmentation 
pattern. This can be explained by an interchanged placing of their N-telopeptide 
side-arms (a1(I) and o2(I)) on the 3-hydroxy-pyridinium ring of the trivalent 
cross-linking residue as shown. Mass spectrometry reveals a favored primary 
fragmentation product on MS/MS that we presume depends on the pyridinium 
ring positions of the peptide arms. This information can help in defining the 
origin and preferred path of interactions of the three precursor collagen se- 
quences giving rise to each trivalent structure. Liquid chromatography/mass 
spectrometry also resolved the HP and LP forms of the cross-links (Fig. 9 shows 
the spectra for the LP stereoisomers). 

Although bone collagen is the principal source of pyridinoline cross-links 
in urine, other tissue collagens also contribute. For example, specific fragments 
from cartilage type II collagen have been identified [106] and targeted for im- 
munoassay as a biomarker of cartilage breakdown [107]. The main source in 
urine of these discrete peptides from type II collagen we believe is from osteo- 
clastic breakdown of mineralized cartilage by osteoclasts. Levels are extremely 
high in growing children with open growth plates, supporting this conclusion 
[108]. Patients with osteoarthritis show on average higher levels than control 
subjects [109] and, in a study of high-performance athletes, runners showed 
higher levels than swimmers or rowers [110]. Accelerated joint remodeling is 
the likely explanation for the raised levels in adults. 


8 
Other Mechanisms of Collagen Cross-Linking 


8.1 
Cystine Disulfides 


For most non-fibrillar collagens of the extracellular matrix (e.g., type IV collagen, 
see earlier) cystine cross-links may be the only source of covalent intra- and in- 
ter-molecular bonds. Type VI collagen forms a characteristic banded filamentous 
network in which the chains are cross-linked as molecules, dimers and tetramers 
by disulfides. No lysine-mediated or other cross-links are present [111]. 


8.2 
Gamma-Glutamyl Lysine Cross-Links 


In addition to lysyl oxidase-mediated cross-links, there is indirect evidence that 
transglutaminase-mediated cross-links might also be involved in the process 
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of polymerization of collagen networks [112-114]. Candidate glutamine sub- 
strate sites have been identified for transglutaminase, but no partnering lysine 
residues for natural cross-link formation have been defined. 

Tissue transglutaminases, related to Factor XIII in the blood-clotting cascade, 
are a family of Ca**-dependent enzymes thought to be active in the covalent 
cross-linking of certain extracellular matrix proteins. They catalyze the linkage 
of specific glutamine and lysine side-chains by a transamidation reaction to 
form epsilon (gamma-glutamyl) lysine cross-links [112-114]. A role in bone 
matrix is suspected [115]. Methods of identification of transglutaminase-me- 
diated cross-links rely on the use of tritiated putrescine or cadaverine to label 
candidate glutaminyl cross-linking sites. No new technique has been developed 
in the last decade. Several [*H]putrescine-binding sites have been identified in 
collagens III, V, XI and XVI. The potential glutamine sites are present in the 
aminopropeptide of type III collagen, the non-triple-helical telopeptides of 
«1(V) and a1(XI) chains, and the N-terminal noncollagenous domain (NC11) 
of a1(XVT) chain [116-118]. However, none of the other cross-linking partners, 
the lysine sites, have been identified or suggested. 


8.3 
Tyrosine-Derived Cross-Links 


The cuticles of C. elegans and other nemotodes consist of short-helix collagen 
polymers cross-linked by dityrosines and trityrosines [119, 120]. The lysyl 
oxidase mechanism does not operate. Instead a peroxidase is responsible. The 
enzyme catalyzing worm cuticle collagen cross-linking is a membrane-bound 
dual oxidase/peroxidase referred to as Duox [121]. A homologue to this enzyme 
is expressed in human tissues, but whether it has a similar function in gener- 
ating extracellular di- and tri-tyrosine cross-links is unknown. Low levels 
of such cross-links can be found in vertebrate tissues but whether they are 
formed specifically or as an oxidative byproduct (e.g., in inflammation) is still 
unclear. 


8.4 
Types VIII and X Collagens 


These homologous short chain molecules form hexagonal lattices. There is 
evidence that collagen type X, restricted to the hypertrophic and mineralized 
zones of growth plate cartilages, is cross-linked by the lysyl oxidase mechanism 
[122, 123], in addition to interchain disulfide bonding. However, site-specific 
cross-linked peptides need to be isolated to establish a role for lysyl oxidase- 
mediated cross-links. It is clear, nevertheless, that unusually strong hydrophobic 
interactions occur between the C-terminal globular domains in the hexagonal 
networks that these collagens form. 
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9 
Outlook 


The functional significance of the differences in cross-linking chemistry be- 
tween tissue types is not clear. Imposed packing constraints on collagen mol- 
ecules through the placement of cross-links may be more significant than the 
chemistry of the links themselves. On the other hand a reversible cross-linking 
chemistry may offer benefits, for example, in facilitating the mineralization of 
collagen fibrils in bone by allowing mineral crystallites to interdigitate and 
push apart the filamentous elements making up a fibril. Pyrrole cross-links are 
also thought to confer special qualities to bone collagen fibrils, perhaps related 
to the unique capacity of bone collagen to mineralize. The function of the pyr- 
role cross-links and significance of the link observed between the microscopic 
character of bone trabeculae and the ratio of pyridinoline to pyrrole cross-links 
need further study. 

The functional significance in general of the trivalent cross-links, which can 
link three adjacent-collagen molecules, two in register and a third staggered by 
a 4D-overlap (where D=1/4.4 of the molecular length), is not clear. These per- 
manent, end-products of hydroxylysine-aldehyde cross-linking are associated 
with tough connective tissues that bear high loads and are subject to minimal 
turnover. Whether they add mechanical strength of a particular quality, for 
example preventing lateral slippage between sub-elements of fibrils or offer a 
mechanism for interfibrillar bonding, is not known. 

Perhaps the most important questions driving current research are aimed 
at understanding the cellular mechanisms that control tissue-specific differ- 
ences in collagen cross-linking chemistry. Pivotal enzymes clearly include the 
three known lysyl hydroxylases, one of which (PLOD2) is believed to act on 
telopeptide lysines and so adapt the nascent collagen to cross-link via the 
hydroxylysine aldehyde pathway. This pathway is associated with normally 
tough connective tissues and with pathological fibrotic conditions. Molecular 
mechanisms that control the expression and activities of these regulatory gene 
products in different cell types will be important to define. 
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